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GENERAL  GEOMETRY  PIC  FOR  MIMD 
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June  1993 


SUMMARY 

The  objectives  of  the  work  programme  specified  in  the  ProposalfSf.  namely 
the 

(i)  derivation  of  MIMD  oriented  algorithms  in  general  curvilinears. 

(ii)  development  of  a  2-D  multiblock  benchmarking  computer  program,  and 
(Hi)  execution  of  benchmarking  computations 

have  all  been  achieved.  The  conclusion  from  the  study  is  that  the  proposed 
methods  will  efficiently  use  MIMD  computers  in  the  design  and  evaluation  of 
HPM  sources  with  complex  geometries.  When  implemented  on  a  large  parallel 
computer,  the  general  geometry  PIC  schemes  will  allow  hitherto  unattainable 
accuracies  and  system  sizes  to  be  modelled. 


1  Introduction 

The  work  described  in  this  Final  Report  summarises  the  work  undertaken 
in  the  research  project  sponsored  by  the  Air  Force  Office  of  Scientific  Re¬ 
search  (AFSC)  under  Contract  F49620-92-C-0035.  "General  Geometry  PIC' 
Algorithms  and  Software  for  Distributed  Memory  MIMD  Computers".  The 
objectives  of  the  work  programme  of  this  Contract  are  described  in  the  pro¬ 
posal  [3] 

J  W  Eastwood,  A  Proposal  to  Develop  General  geometry  PIC  Algorithms 
and  Software  for  Distributed  Memory  MIMD  Computers,  RP363.  Culham 
Laboratory,  Nov  1991. 

These  objectives  may  be  summarised  as 

•  the  derivation  of  MIMD  orientated  algorithms  in  general  curvilinears. 

•  the  development  of  a  2-D  multiblock  benchmarking  computer  program, 

•  benchmarking  computations. 
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The  approach  proposed,  and  successfully  implemented  uses 

•  the  ‘Virtual  Particle'  derivation  method  [2]  applied  to  tensor  field  com¬ 
ponents, 

•  a  multiblock  spatial  decomposition  applied  to  both  fields  and  particles. 

•  transfinite  interpolation  subdivision  of  the  curvilinear  quadrilateral  multi¬ 
blocks  into  quadrilateral  elements, 

•  indirect  (“glue  patch")  addressing  between  multiblocks,  and  logical 
square  mesh  (i,j)  node  addressing  within  blocks. 

The  programme  of  work  was  divided  into  four  Tasks: 

Task  1  Prepare  a  Report  containing  the  derivation  and  statement  of  the 
general  geometry  electromagnetic  particle  in  cell  algorithms  to  be  im¬ 
plemented  in  the  2-D  benchmark  program. 

Task  2  Develop  the  single  block  solver  software  modules  for: 

(i)  explicit  time  stepping  of  Maxwell's  equations. 

(ii)  explicit  particle  time  stepping. 

(iii)  glue  patch  transformations. 

Task  3  Develop  the  2-D  multiblock  benchmarking  program. 

Task  4  Perform  demonstration  test  runs  on  the  Intel  computer  at  Phillips 
Laboratory. 

These  Tasks  and  their  outcome  are  described  in  detail  in  the  following  four 
Sections. 

In  addition  to  the  Final  Report.  Culham  undertook  to  supply  (subject  to 
IPR  conditions  specified  in  [3])  a  final  version  of  the  software  and  test  input 
and  output  data  sets.  These  have  already  been  delivered  and  installed  on 
computers  at  Phillips  Laboratory.  Conclusions  and  recommendations  from 
the  work  undertaken  on  this  Contract  are  described  in  the  final  Section. 

Whilst  the  immediate  goals  of  the  work  on  this  Contract  are  Tasks  1-4 
listed  above,  a  broader  view  has  been  taken  in  treating  them  as  steps  towards 
the  ultimate  practical  realisation  of  state-of-the-art  simulation  software  for 
three  dimensional  configurations.  The  software  is  intended  primarily  to  aid 
in  the  design  and  interpretation  of  HPM  sources  and  power  transmission 
systems,  although  the  fundamental  nature  of  the  core  software  is  such  that 
it  may  be  of  utility  in  other  computational  electromagnetic  applications. 
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2  Task  1:  Algorithm  Derivation 

The  physics,  numerical  analysis,  software  design,  the  impact  of  contemporary 
computer  architecture,  and  the  eventual  need  to  extend  to  three  dimensions 
have  all  been  taken  into  account  in  the  algorithm  derivation  and  in  the  design 
and  implementation  of  the  software.  The  approach  adopted  is  that  which  we 
believe  most  effectively  realises  the  underlying  aim  of  the  work,  and  has  the 
following  features: 


•  a  variational  finite  element  derivation,  using  tensor  fields  in  the  action 
integral  formulation  of  Maxwell's  equations. 

•  covariant  (E  and  H)  and  contravariant  (d  and  b)  electromagnetic  field 
components. 

•  orthogonal  coordinates  where  appropriate,  but  general  curvilinear  co¬ 
ordinates  where  the  extra  freedom  is  needed. 

•  multiblock  spatial  decomposition  of  complex  domains. 

•  indirect  (“glue  patch")  addressing  between  blocks. 

•  regular  ( i.j.fr )  cubic  lattice  addressing  within  blocks. 

•  transfinite  interpolation  subdivision  of  curvilinear  hexahedra)  blocks 
into  finite  elements. 

•  compact  metric  storage,  and 

•  data  and  algorithm  organisation  optimised  to  exploit  distributed  mem¬ 
ory  MIMD  computers. 


To  design  software,  which  not  only  meets  the  objectives  (see  above)  but 
also  provides  the  basis  for  a  general  curvilinear  field  solver  with  multiblock  de¬ 
composition  amenable  to  distributed  memory  MIMD  implementation,  hats  de¬ 
manded  extensive  numerical  analysis,  software  engineering  and  benchmark¬ 
ing.  In  the  course  of  this  work,  a  number  of  problems  have  been  encountered: 
for  example,  in  discretising  the  constitutive  relationships  and  boundary  con¬ 
ditions.  in  defining  an  effective  data  storage  scheme  for  handling  the  metric 
special  cases,  in  designing  for  high  computational  intensity  in  MIMD  multi¬ 
block  implementations,  and  so  forth.  All  problems  encountered  have  been 
substantially  overcome. 

A  detailed  discussion  of  the  algorithm  derivation  and  statement  is  given 
in  the  Task  1  Report 


■  • 
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J  W  Eastwood,  R  VV  Hockney  and  W  Arter,  General  Geometry  PIC  for 
MIMD  Computers  RFFX(9‘2)5‘2,  Cuiham  Laboratory,  August  1992 

which  is  appended  to  the  present  Report  as  Annex  1. 


3  Task  2:  Uniblock  Solver  Software 

The  general  geometry  VP  electromagnetic  PIC  derivation  described  in  Annex 
1  reduces  the  problem  of  modelling  a  complex  shaped  device  into  that  of 
computing  fields  and  particle  dynamics  in  a  set  of  rectangular  blocks  (in 
curved  space)  with  boundary  conditions  applied  only  at  the  surfaces  of  the 
blocks.  Boundary  conditions  are  implemented  by  copying  data  from  the 
source  block  surfaces  (currents  for  Ampere's  Law.  electric  fields  for  Faraday's 
Law  and  particle  coordinates  for  the  equations  of  motion)  to  the  source  glue 
patch  buffers,  performing  appropriate  transformation  of  data  in  the  gluepatch 
buffers,  and  then  copying  the  results  from  the  target  gluepatch  buffers  to  the 
target  blocks. 

Each  uniblock  of  the  multiblock  decomposition  behaves  as  an  indepen¬ 
dent  computation  with  boundary  conditions  provided  by  surface  boundary 
condition  patches  or  gluepatches.  This  independence  is  reflected  in  the  cod¬ 
ing  of  the  uniblock  subprograms,  which  are  written  in  terms  of  the  local  block 
indexing  and  local  position  coordinates. 

The  objective  of  Task  2  was  to  write  Fortran  code  for  the  uniblock  cal¬ 
culations.  These  subprograms  may  be  divided  as  follows: 

(i)  electromagnetic  routines 

(ii)  electromagnetic  boundary  condition  routines 

(iii)  particle  routines 

(iv)  particle  boundary  condition  routines. 

Annex  2  lists  the  documentation  modules  from  the  Fortran  benchmark 
program  delivered  to  Phillips  Laboratory.  The  uniblock  subprograms  de¬ 
scribed  below  are  provided  as  part  of  the  benchmark  program.  The  OLYM¬ 
PUS  conventions  for  notation,  layout  and  documentation  [1]  have  been  fol¬ 
lowed.  Subprograms  with  the  same  names  as  those  in  the  prototypical 
timestepping  program  CRONUS  have  the  same  function  as  the  CRONUS 
dummy  counterparts. 

In  referring  to  subprograms  in  the  following  subsection  the  notation  (c.s)name 
will  be  used,  so  for  example  (2.20)  AMPERE  will  refer  to  the  main  calcula¬ 
tion  (class  2)  routine,  subroutine  number  20,  which  is  stored  in  file  c2s20.f. 
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The  contents  of  file  c2s'20.f  will  be  subroutine  AMPERE,  which  computes  the 
displacement  current  (cf  documentation  file  INDSUB.doc  in  Annex  2).  Sim¬ 
ilarly,  common  blocks  have  their  group  and  member  numbers;  for  example. 
[C4.5]  COMADP  is  a  member  of  group  4  (housekeepings  blocks  with  name 
COMADP.  etc. 


3.1  Electromagnetic  routines 

The  principal  routines  for  advancing  Maxwell  s  equation  in  each  uniblock  :.r~ 

(2.20)  AMPERE  which  computes  scaled  contravariant  displacement  currents. 
d‘.  according  to  eq(4.6.5)  of  Annex  1. 

(2.21)  FARADA  which  updates  scaled  contravariant  magnetic  fields,  b' .  ac¬ 
cording  to  eq(4.6.S)  of  Annex  1. 

(2.22)  GBTOH  which  computes  covariant  magnetic  field  intensities.  //,.  from 
b‘  (eq( 4.5.1 )  of  Annex  1). 

(2.23)  GDTOH  which  computes  covariant  electric  fields.  E,.  from  d'  (eq(4.5.2) 
of  Annex  1 ). 

Input  and  output  to  these  routines  is  via  formal  parameters.  Included 
common  blocks  are  [Cl. 9]  COMDDP  and  [C4.5J  COMADP.  The  first  of  these  is 
to  give  access  to  OLYMPUS  development  and  diagnostic  parameters  for  code 
testing,  and  the  second  contains  symbolic  names  of  addressing  constants.  The 
use  of  symbolic  addressing  constants  was  adopted  to  allow  dimensionality  and 
data  storage  layout  to  be  changed  without  recoding  the  uniblock  routines. 

The  remaining  uniblock  electromagnetic  routines  are  general  mesh  ma¬ 
nipulation  routines,  applicable  to  any  vector  field  defined  on  the  element  net. 
These  routines  perform  *he  following  functions. 

(1.21)  NILVEC  sets  all  nodal  amplitudes  of  a  vector  field  on  the  element  net 
to  zero 

(1.20)  SETVEC  sets  nodal  amplitudes  to  a  constant  vector 

(2.26)  ADDVEC  adds  two  vector  fields  together 

(2.34)  CPYVEC  copies  one  vector  field  to  another 

(2.36)  AVEVEC  averages  two  vector  fields. 

Specifications  of  the  input  to  and  output  from  these  electromagnetic  rou¬ 
tines  of  the  uniblock  solver  software  are  given  in  the  nine  subsections  of 
Appendix  A.l.  These  routines  can  be  used  as  they  stand  for  both  two  and 
three  dimensional  calculations. 
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3.2  Electromagnetic  boundary  condition  routines 

The  uniblock  electromagnetic  boundary  condition  routines  (2.25)  BCOPAT 
and  (2.28)  BCONE  are  used  respectively  to  apply  external  boundary  condi 
tions  on  the  displacement  field  and  electric  field.  The  present  versions  of 
these  routines  can  apply  conductor,  applied  field  and  isotropic  resistive  wall 
boundary  conditions  for  element  nets  which  are  orthogonal  at  the  external 
boundaries.  The  routines  are  written  to  work  in  both  two  and  three  dimen¬ 
sions. 

Internal  boundary  conditions  between  blocks  are  handled  by  the  gluepatch 
routine  (2.30)  GLUEIO.  This  routine  works  for  othogonal  and  non-orthogonal 
in  both  two  and  three  dimensions. 

A  specification  of  the  input  to  and  output  from  the  uniblock  electro¬ 
magnetic  boundary  condition  routines  listed  above  is  given  in  the  Appendix, 
subsection  A. 2. 


3.3  Particle  routines 

The  principal  particle  integration  routines  are 

(2.11) M0VCUR  .  which  updates  the  part  icle  position  coordinates  and  assigns 
current  to  the  element  net  according  to  eqs(5.2.9)-(5.2.12)  of  Annex  1 

(2.12) ACCEL  .  which  updates  particle  momenta  using  the  scheme  described 
in  Section  5.4.2  of  Annex  1. 

MOVCUR  calls  subsidiary  routine  MERGE  in  computing  the  location  of 
the  virtual  particles,  and  calls  ASSCUR  to  assign  current  from  the  particles 
to  the  element  net. 

A  specification  of  the  input  to  and  output  from  these  uniblock  particle 
routines  is  given  in  the  Appendix,  subsection  A. 3.  The  first  subprogram 
listed  therein,  (2.10)  SETCUR  reinitialises  the  current  accumulation  arrays 
each  timestep. 


3.4  Particle  boundary  condition  routines 

Internal  particle  boundary  conditions  and  particle  absorption  at  external 
boundaries  are  handled  by  the  gluepatch  routine  (2.14)PARTIO  of  the  multi¬ 
block  test  program  (cf  Section  3).  Particle  injection  is  handled  by  (2.19) 
INJECT,  which  uses  the  uniblock  routine  (2.18)QSHARE  to  find  charge  den¬ 
sity  at  cathode  surfaces  and  (2.13)EMITEL  to  emit  electrons  from  the  cathode 
surface  using  a  space  charge  limited  emission  algorithm. 
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Table  1:  Global  Addrfssmg  Arrays 


Name 

Meaning 

MPESTB 

MPORTB 

MBKPES 

MPRBLK/NXTBLK 

MPATBK 

MBKPAT /NXTPAT 
MBKTYP 
MBCPBK 

MBKBCP/NXTBCP 

MPATBK 

MBKPAT/NXTPAT 

Processor  to  process  pointer  table 

Process  to  processor  pointer  table 

Block  to  process  pointer  table 

Process  to  block  pointer  header  and  link  tables 

Patch  to  block  pointer  table 

Patch  to  block  pointer  header  and  link  tables 

Block  to  blocktype  pointer  table 

Boundary  condition  (be)  patch  to  block  pointer  table 
Block  to  be  patch  pointer  header  and  link  tables 
Gluepatch  to  block  pointer  table 

Block  to  gluepatch  pointer  header  and  link  tables 

A  specification  of  the  input  to  and  output  from  these  uniblock  particle 
boundary  condition  routines  is  given  in  the  Appendix,  subsection  A.l. 


4  Task  3:  Multiblock  Test  Program 

The  Multiblock  Test  Program  provides  the  testbed  in  which  the  uniblock 
modules  described  in  the  previous  section  are  combined  with  control  and 
message  passing  routines  to  form  the  MIMD-PIC  test  program. 

The  main  control  routine  which  calls  the  '  irious  uniblock  routines  is 
(2.1)STEP0N.  STEPON  calls  further  routines  (2.24)BCSURD.  (2.26)BCSURE 
and  (  2.29)BCSYM  to  control  the  application  of  external  electromagnetic 
boundary  conditions,  and  (2.32)BLKIO  to  handle  the  transformation  and 
copying  of  electromagnetic  gluepatch  data  to  and  from  the  gluepatch  buffer 
arrays.  Particle  data  is  copied  to  and  from  gluepatch  buffer  arrays  by 
(2.14)PARTIO.  Particle  injection  boundary  conditions  are  controlled  by  (2.19)IN  JECT. 


4.1  Data  Organisation 

Global  and  local  data  organisation  in  the  program  both  follow  the  scheme 
outlined  in  Annex  1.  Table  1  summarises  the  array  names  for  pointer  ta¬ 
bles  connecting  processor  to  process,  process  to  block,  block  to  blocktype. 
block  to  boundary  patch  and  block  to  gluepatch.  In  most  cases  there  are 
two-way  pointers,  and  in  instances  where  two  names  are  given,  pointers  are 
implemented  as  linked  lists. 

Global  data  have  the  same  values  on  all  processes.  Local  data  have  differ¬ 
ent  values  on  different  processes.  Local  data  includes  the  gluepatch  to  buffer 
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pointers  (MGLTOB  and  MBTOGL),  field  and  particle  addressing  data,  and 
the  field  and  coordinate  values. 

Field  data  for  each  uniblock  is  mapped  onto  one  dimensional  Fortran 
arrays  as  described  in  Section  7.4  of  Annex  1.  Uniblock  subprograms  (cf 
Section  3)  are  written  relative  to  origin  1  in  the  field  and  addressing  arrays. 
The  relevant  origin  location  in  the  multiblock  arrays  for  each  block  is  passed 
to  the  subprograms  by  calling  with  the  appropriate  offsets.  This  can  be  seen 
by  inspection  of  (2.1)STEP0N.  For  example 


C 

CL  1.2  clear  current  arrays 

CALL  SETCUR( 

♦  LBLAS  (LOBLAS(IBTYPE)) , 

♦  C  (LORFBL(IBLOCK))  ) 

C 


passes  the  block  addressing  information  in  array  LBLAS  for  block  type  IB- 
TYPE.  and  sets  currents  C  to  zero  for  block  1BL0CK. 

Similarly,  particle  address  and  coordinates  are  stored  in  one  dimensional 
arrays  LPARAS  and  COORDS,  respectively.  Offsets  for  each  block  are  passed 
to  uniblock  routines  by  calling  with  the  pointer  to  the  location  of  origins  of 
coordinate  (LOCOOR)  to  the  current  block,  for  example 


C 

C 


C 


move 

CALL  MOVCURfO, 
LPARAS(LOPARA(IBLOCK) ) , 

♦  SPATR, 

+  LEC0VA(1 .LOECOA(IBTYPE) ) f 

+  EC0V(L0EC0V(IBTYPE)) , 

+  COORDS (LOCOOR(IBLOCK)) , 

+  GPATI , 

+  GPATI , 

♦  LBLAS  (LOBLAS(IBTYPE) ) , 

♦  C  (LORFBL(IBLOCK))  ) 


For  further  information  on  the  data  storage,  see  Annex  2  and  the  docu¬ 
mented  program  listing. 
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4.1.1  Program  Structure 

The  benchmark  program,  MIMD-P1C.  follows  the  canonical  notation  and 
structure  of  the  skeleton  initial  value/boundary  value  timestepping  program 
CRONUS  [1].  Program  flow  is  controlled  by  (0.3}COTROL,  the  main  timestep 
loop  is  controlled  by  (2.1)STEP0N  and  output  is  controlled  by  (3.1)0UTPUT. 

The  decimal  numbered  subsections  of  (2.1)STEP0N  reflect  the  steps  of 
the  timestep  loop:- 

(1)  move  particles  and  compute  currents 
loop  blocks  or  process  (1.1) 

(1.2)  initialise  current  (SETCUR) 

(1.3)  inject  particles  (INJECT) 

(1.4)  move  and  accumulate  current  (MOVCUR) 

(1.5)  sort  lost  particles  to  giuepatch  buffer  (PARTIO) 
end  loop 

loop  to  empty  giuepatch  buffers 

(1.6)  exchange  particles  (XPART) 

(1.7)  complete  move  for  exchanged  particles 
loop  blocks  on  process 

move  particle  from  buffer  to  block  (MOVCUR) 
sort  lost  particles  to  giuepatch  buffer  (PARTIO) 
end  loop 
end  loop 

(2)  update  displacement  field 

loop  blocks  on  process 

(2.1)  compute  H  from  b  (GBTOH) 

(2.2)  compute  displacement  current  (AMPERE) 

(2.3)  load  giuepatch  buffers  (BLKIO) 
end  loop 

(2.4)  exchange  giuepatch  buffers  (XPATCH) 
loop  blocks  on  process 

(2.5)  copy  gluepatches  to  blocks  (BLKIO) 
end  loop 

(2.6)  apply  boundary  conditions  to  d  (BCSYM,  BCSURD) 

(3)  compute  new  E  field 

loop  blocks  on  process 

(3.1 )  compute  E  from  d  (GDTOE) 

(3.2)  load  giuepatch  buffers  (BLKIO) 
end  loop 

(3.3)  exchange  giuepatch  buffers  (XPATCH) 
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loop  blocks  on  process 

(3.4)  copy  glue  patches  to  blocks  (BLKIO) 
end  loop 

(3.5)  apply  boundary  conditions  to  E  (BCSVM,  BCSURE) 

(4)  advance  magnetic  fields 

loop  blocks  on  process 

save  old  6  (CPYVEC) 
advance  b  (FARADA) 
compute  time  centred  b  (AVEVEC) 

(5)  accelerate  particles 

ACCEL 

end  loop 

The  numbering  in  this  summary  of  the  timestep  loop  corresponds  to  the 
section  numbers  in  ('2.1)  STEPON.  and  the  names  in  brackets  are  the  sub¬ 
program  names. 

4.2  Test  Cases 

The  source,  executables  and  test  data  for  the  workstation  version  of  MIMD 
PIC  has  been  installed  in  directory  mayl2.d on  the  SI  N  workstation  ppws04 
at  Phillips  Laboratory.  This  version  differs  primarily  from  the  iPSC  version  in 
that  calls  to  Intel  interprocessor  communications  routines  have  been  replaced 
by  dummies.  To  execute  the  program  using  one  of  the  test  datasets,  e.g. 
It $117. dot  type 

xmimdpic  testl7.dat 

If  the  xgho$l  library  has  been  linked  into  imimdpic.  then  a  window  with 
graphics  output  will  appear,  otherwise  only  printer  output  file  oJtstllpl , 
restart  binary  file  rJe$tl7pl  and  graphics  output  file  gjfsll7pl  will  be  pro 
duced.  The  graphics  file  may  be  viewed  using  the  GHOST  interactive  viewer 
program  ighost{ 5]. 

The  test  data  sets  included  in  the  directory  mayl2.d  are 

tesll.dat  :  coded  exchange  test 

lest 2. dal  current  and  d  array 

tesl3.dat  :  transmission  line  test/constant  d 
tesl4.dat  :  transmission  line  test/travelling  wave 
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test5.dat  3-D  coded  current  array 

test6.dat  3-D  field  test 

test7.dat  particle  mover  test 

testS.dat  particle  mover  test 

tcst9.dat  :  periodic  mover  test 

test10.dat  cyclotron  orbit 

testll.dat  cyclotron  with  E  x  B 

testl2.dat  current  assignment  check 

testl3.dat  short  MITL  test 

testl4-dat  longer  MITL 

testl5.dat  Further  E  x  B  test 

testl6.dat  :  1  cavity  device/no  particle 

testl7.dat  :  5  cavity  MILO/few  particles 

testlS. dot  :  4  cavity  MILO/50  steps 

testl9.dat  :  Symmetry  be  EM  transmission  line 

test20.dat  :  test  error  in  G2LMAT 

test2l.dat  particle  reflection  be  test 

test22.dat  uniform  d  start  5  cavity  milo/sheat  plot 

test23.dat  1  block  MILO  test 

tist24.dat  :  20  block  MILO  test /5000  steps 

tcst25.dat  20  block/finer  mesh  MILO  15.000  steps 

test26.dat  04  block  MILO 

test31.dat  3-D  version  of  test  1 

tcst33.dat  3-D  version  of  test  3 

test34-dnt  3-D  version  of  test  4 

The  input  and  output  dataset  for  these  test  cases  are  in  directory  may  12. d 
under  user  eastwood  on  the  ppxrs04  machine,  input  dataset  testnn.dat  has  a 
corresponding  output  file  ojestnnpl.  GHOST  graphical  output  file  gjesfnnpl 
and  restart  file  rJestnnpl.  (The  suffix  'pi'  is  for  a  one  processor  run.  Output 
from  m  processor  runs  on  the  iPSC  have  suffices  pm:r  for  the  rth  processor 
output  from  an  m  processor  hypercube). 

The  input  and  output  datasets  are  largely  self  explanatory,  so  will  not  be 
described  further  here.  The  graphical  output  files  may  be  viewed  either  using 
the  GHOST  interactive  X-window  viewer  xghost.  or  by  converting  them  to 
(say)  Postscript  files  and  printing  them. 

5  Task  4:  iPSC  Parallel  Benchmarking 

The  parallel  implementation  of  the  multiblock  program  differs  only  in  the 
handling  of  the  gluepatch  buffer  exchange  between  uniblocks.  In  the  se¬ 
rial  code,  gluepatch  exchange  involves  only  memory  to  memory  copying. 
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Figure  1:  A  snapshot  of  the  radial  fields  and  electron  distributionsfrom  the 
MILO  test  run  testS5.  The  electric  fields  are  measured  along  lines  parallel 
to  the  axis  of  the  device  through  the  centre  of  the  drift  space  (green/black 
curve),  and  half  way  up  the  cavities  (red/blue  curve).  Fields  are  in  units  of 
applied  field,  and  distance  is  in  element  widths. 
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The  parallel  code  uses  memory  to  memory  copying  for  gluepatches  between 
uniblocks  on  the  same  processor,  and  uses  interprocessor  message  passing 
between  blocks  on  different  processors.  The  gluepatch  exchange  subprogram 
(2.31)XPATCH  contains  explicit  calls  to  iPSC  routines,  and  in  the  serial  ver¬ 
sion,  these  routines  are  replaced  by  dummies  (cf  file  cpsl.f). 

The  test  cases  described  below  were  chosen  firstly  to  show  that  the  parallel 
implementation  worked  correctly,  and  secondly  to  evaluate  the  performance 
enhancements  that  could  be  achieved  by  parallel  processing. 


5.1  Test  problems 

The  test  problems  for  the  parallel  benchmarking  were  chosen  subject  to  the 
constraints  that 

•  they  use  simple  geometrical  elements,  since  the  general  metric  element 
computation  routines  have  not  been  incorporated  in  the  benchmark 
software. 

•  they  reflect  realistic  engineering  types  of  calculations  where  the  do¬ 
mains  are  not  simply  rectangles,  and  the  particle  filling  is  nonuniform. 

•  the  results  can  be  cross  checked  with  those  from  existing  serial  codes. 

These  constraints  led  to  a  planar  MILO  configuration  being  chosen  for  the 
example  geometry. 

Typically.  MILO  calculations  are  started  from  an  empty  device  with  zero 
fields,  and  an  electric  field  is  applied  at  the  generator  boundary.  Computa¬ 
tions  are  then  run  for  several  tens  of  thousands  of  step.  Figure  1  shows  the 
axial  electric  fields  and  electron  distribution  for  one  such  calculation  after 
12,000  steps.  This  computation  showed  the  same  behaviour  as  a  cross  check 
using  the  MAGIC  code  [6] 

The  two  test  cases  for  the  parallel  computations  are  the  same  as  two  of  the 
test  cases  listed  in  Table  1.  The  input  datasets  casel.dat  and  case2.dat  are 
the  parallel  computation  equivalents  of  testl7.dat  and  test24.dat,  respectively. 
The  datasets  for  the  parallel  runs  differ  from  the  serial  ones  only  in  the 
addition  of  an  extra  data  input  variable  to  specify  the  number  of  processors 
to  be  used  for  the  computation. 

casel.dat  (testl7.dat)  is  a  small  calculation  example,  which  describes  a 
five  cavity  MILO  using  12  uniblocks  with  coarse  element  nets.  The  100  step 
run  follows  the  standard  MILO  startup  from  an  empty  field-free  configura¬ 
tion.  Large  electron  superparticles  are  used  so  that  by  the  end  of  the  100 
step  run  there  are  less  than  140  particle  in  the  MILO.  Running  the  profiler 
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Figure  2:  The  streak  plot  of  particle  trajectories  for  iPSC  run  using  dataset 
casel.dat  shows  the  correct  passing  of  particle  coordinates  between  uniblocks 
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on  the  workstation  version  of  the  code  for  this  test  case  showed  that  the  mix 
of  work  was  uncharacteristic  of  particle  calculations.  The  breakdown  of  the 
main  elements  of  the  timestep  cycle  was  as  follows: 

34%  PARTIO  and  BLKIO 

25%  AMPERE.  GDTOE,  FARADA  and  GBTOH 

14%  MOVCUR  and  ACCEL 

The  dominant  part  of  the  calculation  cycle  is  the  work  to  exchange  data 
between  block,  followed  by  the  work  in  the  electromagnetic  field  calculation. 

case2.dat  (test24.dat)  is  a  medium  sized  calculation.  It  was  chosen  to 
reflect  the  general  characteristics  of  a  production  calculation,  but  in  a  run 
which  takes  only  a  ew  (100)  timesteps.  This  differs  from  the  normal  MILO 
runs  in  that  the  device  is  initially  filled  with  a  nonzero  electric  and  magnetic 
field.  The  result  is  that  electrons  are  emitted  from  the  whole  length  of  the 
cathode,  and  there  is  a  strong  initial  transient  where  the  electrons  fill  only 
part  of  the  device  volume.  The  percentages  of  the  calculation  time  taken  by 
the  patch  exchange,  electromagnetic  and  particle  parts  are  now 

6%  PARTIO  and  BLKIO 

12%  AMPERE.  GDTOE,  FARADA  and  GBTOH 

73%  MOVCUR  and  ACCEL 

This  ordering  of  the  amount  of  work  is  more  typical  of  realistic  particle 
computations,  where  the  particle  integration  dominates.  The  interblock  data 
transfer  routines  are  now  a  small  part  of  the  serial  calculation.  Since  it  is  only 
this  part  of  the  calculation  which  involves  message  passing  in  the  parallel 
implementation  the  scope  for  parallel  speedup  for  cast  J. dal  will  be  much 
greater  than  for  cast/. 

Further  cases,  where  the  benchmark  code  is  extended  to  undertake  three 
dimensional  calculations,  are  planned  for  the  future.  On  the  basis  of  com¬ 
putational  intensity  estimates,  we  anticipate  much  greater  scope  for  speedup 
than  can  be  achieved  in  two  dimensions.  3-D  tests  would  be  invaluable  in 
evaluating  the  capabilities  of  machines  such  as  the  Intel  Paragon  for  use  in 
large  scale  electromagnetic  computations. 


5.2  Benchmark  results 

Case-1  has  been  used  as  a  validation  example  throughout  the  development 
of  the  parallel  code,  because  it  contains  particles  which  pass  through  several 
blocks  belonging  to  different  processors.  To  demonstrate  that  these  compli¬ 
cated  cases  are  being  correctly  computed.  Fig.  2  shows  the  orbits  of  particles 
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in  which  the  colour  changes  as  the  particles  are  computed  by  different  proces¬ 
sors.  Such  a  colour  change  indicates  that  the  particle  coordinates  are  being 
correctly  transferred  by  way  of  a  message  containing  their  coordinates  being 
sent  from  one  processor  to  another.  We  also  note  that  the  orbits  do  not  stop 
at  the  non-physical  boundary  between  the  regions  computed  by  different  pro¬ 
cessors.  as  would  be  the  case  if  the  message  transfer  between  processors  was 
not  operating  correctly. 

Figure  3  shows  the  scaling  charcteristics  of  case- 1  when  run  on  an  iPSC/860 
with  up  to  16  processors,  together  with  the  idea!  speedup  line.  The  unit  of 
performance  used  is  timesteps  per  second  (tstep/s),  and  the  graph  shows 
how  the  performance  scales  for  a  fixed  size  problem  as  the  number  of  proces¬ 
sors  increases.  This  unit  of  performance  is  to  be  preferred  to  the  traditional 
Speedup  (ratio  of  p-processor  performance  to  one  processor  performance), 
because  it  retains  the  absolute  speed  of  the  calculation  which  is  of  more  in¬ 
terest  to  the  user  than  a  speedup  ratio.  The  ideal  linear  speedup  line  is  also 
shown  for  the  case  with  output  to  the  cube  disk  and  additional  diagnostic 
output  switched  on  (NLREPT=T):  this  is  the  theoretical  performance  which 
would  be  obtained  if  the  performance  scaled  up  linearly  with  the  number  of 
processors.  This  curve  will  not  be  reached  in  practice  because  of  load  imbal¬ 
ance,  communication  overheads,  and  essentially  serial  code  that  cannot  be 
parallelised.  The  latter  includes  any  code  that  is  repeated  for  convenience  in 
all  processors,  and  will  eventually  cause  the  performance  to  saturate  (Amdahl 
saturation)  and  even  subsecpiently  decrease  with  the  number  of  processors. 

Three  curves  are  shown  depending  on  whether  the  printed  output  is  re¬ 
turned  to  the  host  (bottom  two  curves  marked  to  HOST ).  or  whether  it  is  sent 
to  the  to  the  disks  that  are  directly  connected  to  the  iPSO/860  hypercube 
(top  curve  marked  to  CUBE).  The  output  is  controlled  by  the  variable  NL- 
REPT=T  (full  report)  or  NLREPT  =  F  (minimal  report).  As  already  noted. 
case-1  is  a  small  test  calculation  used  in  program  development,  and  is  too 
small  to  make  efficient  use  of  a  large  MPP.  It  is  not  surprising,  therefore, 
that  the  performance  scaling  is  poor.  All  curves  show  the  onset  of  Amdahl 
saturation,  due  to  repeating  the  output  in  all  processors,  but  the  importance 
of  using  the  CUBE,  rather  than  HOST  file  system  is  evident. 

Figure  4  shows  the  scaling  behaviour  for  the  medium  sized  case-2,  com¬ 
piled  under  both  if77  (UNOPTIMISED)  and  if77  -0  (OPTIMISED).  The 
ideal  linear  speedup  is  fitted  to  the  optimised  case.  The  gain  obtained  us¬ 
ing  compiler  optimisation  is  clearly  seen.  Although  the  performance  can  be 
seen  to  fall-off  from  the  linear  speedup  line,  the  scaling  can  be  regarded  as 
reasonable  for  a  problem  of  this  size.  Even  when  there  are  only  four  blocks 
per  processor,  more  than  75  per  cent  of  the  ideal  linear  speedup  is  realised. 
Unless  the  uniblock  are  further  subdivided,  saturation  for  this  case  must  be 
reached  by  64  processors.  However,  for  the  type  of  calculation  for  which  this 
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Figure  3:  Temporal  Performance  in  timestep  per  second  (tstep/s)  for  case-1. 


software  is  designed,  we  would  not  expect  to  meet  performance  saturation 
until  several  hundred  processors  were  used.  More  data  is  recpiired  is  required 
for  the  larger  cases  to  determine  the  best  buffering  strategy,  and  adapt  tb»» 
software  to  return  the  best  performance.  This  require*  access  to  an  Intel 
Paragon  with  at  least  200  processors,  and  preferably  more. 

6  Final  Remarks 

The  objectives  of  the  work  programme  specified  in  the  Proposal[3],  namely 
the 

(i)  derivation  of  MIMD  oriented  algorithms  in  general  curvilinears. 

(ii)  development  of  a  2-D  multiblock  benchmarking  computer  program,  and 

(iii)  execution  of  benchmarking  computations 
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have  all  been  achieved.  In  some  aspects  the  results  achieved  have  ex¬ 
ceeded  requirements.  For  instance,  the  uniblock  routines  (Section  3)  and  the 
multiblock  test  program  (Section  4)  can  already  be  used  in  their  present  form 
for  both  two  and  three  dimensional  cases.  Although  the  software  is  suitable 
for  parallel  benchmarking,  much  work  remains  before  it  becomes  a  usable 
tool  for  microwave  device  simulation;  the  initialisation  is  difficult  to  use  in 
its  present  form,  and  still  lacks  the  subprograms  to  compute  metric  tensor 
elements  for  general  curvilinear  uniblocks;  also  the  output  from  the  code  is 
quite  limited.  Further  development  and  a  considerable  amount  of  validation 
is  required  before  the  software  can  be  regarded  as  sufficiently  debugged  for 
routine  microwave  computations. 

Most  of  the  core  timestepping  routines  for  the  three  dimensional  extension 
of  the  benchmarking  code  are  complete.  Known  exceptions  to  this  are 

(i)  the  ability  to  handle  general  curvilinear  external  boundary  conditions 
on  the  electromagnetic  fields. 

(ii)  non-lumped  approximations  to  the  electromagnetic  equations. 

(iii)  three  dimensional  space  charge  limited  and  beam  emission  particle 
boundary  conditions. 

It  emerged  during  the  implementation  of  the  particle  integration  routines 
that  a  more  efficient  particle  momentum  integration  may  result  from  using 
local  non-orthogonal  coordinates  rather  than  local  cartesians;  we  recommend 
that  the  question  as  to  which  approach  is  most  effective  is  resolved  before 
further  extension  of  the  particle  software  is  undertaken. 

The  results  of  the  preliminary  benchmark  computations  showed  encour¬ 
aging  speed  up.  even  for  modestly  sized  calculations.  The  present  implemen¬ 
tation  bases  message  passing  on  a  patch  to  patch  basis.  On  machines  with 
high  interprocessor  message  passing  latency,  further  speed  up  would  result 
from  presorting  the  patches  and  performing  message  passing  on  a  processor 
by  processor  basis  (cf  below). 

6.1  The  LPM2  benchmark 

The  results  reported  in  section  5.2  show  that  the  benchmark  version  of  the 
new  parallel  code  works,  and  can  be  used  to  test  the  scaling  behaviour  of 
Massively  Parallel  Processors  (MPPs)  that  may  be  considered  for  acquisition 
by  the  I'SAF.  both  now  and  in  the  future.  The  most  useful  way  of  doing  this 
is  to  offer  the  benchmark  (which  we  have  called  LPM2  for  Local-Particle- 
Mesh  #2)  as  a  component  of  a  widely  disseminated  benchmark  set;  this  will 
result  in  performance  numbers  being  produced  by  the  manufacturers  as  a 
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matter  of  course  on  all  their  new  computers.  Thus  the  I'SAF  would  see  that 
performance  of  one  of  their  important  class  of  codes  quoted  without  having 
to  take  any  action,  much  in  the  same  way  that  UNPACK  benchmark  results 
currently  appear. 

A  recent  initiative  by  large  scale  computer  users  was  taken  at  Super- 
computing92  for  exactly  this  purpose  and  has  held  three  meetings.  This 
committee,  called  the  ParkBench  committee  (Parallel  Kernel  Benchmarks), 
aims  to  specify  a  set  of  public-domain  benchmarks  for  the  evaluation  of  par¬ 
allel  systems  and  MPPs  that  is  acceptable  to  both  the  user  community  and 
the  manufacturers.  This  committee  is  currently  chaired  by  Professor  Hock¬ 
ney,  and  is  looking  for  a  parallelised  PIC  code,  which  expands  up  to  MPP 
sized  problems.  In  our  view  it  would  be  in  the  long-term  advantage  of  the 
USAF  to  put  the  LPM2  benchmark  code  into  the  public  domain  and  offer  it 
to  fulfill  this  role  in  the  ParkBench  benchmark  suite.  If  I'SAF  is  agreeable  to 
this  action  Professor  Hockney  will  undertake  to  submit  LPM2  to  ParkBench 
and  hopefully  have  it  encorporated  in  the  benchmark  suite. 

In  order  to  test  the  behaviour  of  the  new  parallel  code  fully  it  is  necessary 
to  run  it  on  a  real  machine  with  several  hundred  processors.  This  should  be 
a  Paragon,  and/or  competitive  computer  (e.g.  Meiko  CS2.  currently  being 
considered  by  LLL).  This  will  determine  the  extent  of  performance  saturation 
which  occurs  with  the  present  code.  There  are  steps  that  can  then  be  taken  to 
improve  the  performance  at  saturation,  and  to  delay  the  onset  of  saturation. 
However  without  access  to  a  large  MPP.  the  timings  necessary  for  such  an 
optimisation  cannot  be  performed.  We  place  high  priority  on  obtaining  such 
measurements  as  the  first  stage  of  the  proposed  future  work. 

Early  measurements  on  the  Intel  Paragon  [7]  show  that,  although  the 
asymptotic  bandwidth  of  the  Paragon  is  about  eight  times  that  of  the  iPSC/860 
the  message  startup  time  is  about  twice  as  long  as  that  of  the  Intel  iPSC’/860. 
Since  the  arithmetic  processors  used  on  the  two  computers  are  the  same,  the 
Paragon  will  show  improved  performance  over  the  iPS('/8(>0  only  if  a  few 
long  rather  than  many  short  messages  are  sent.  This  puts  a  premium  on 
merging  many  small  messages  into  one  large  one  and  then  sorting  the  results 
after  the  message  has  arrived.  The  current  code  does  not  exploit  this  pos¬ 
sibility.  and  we  recommend  that  one  line  of  future  development  be  to  insert 
such  message  merging  and  sorting  code. 


-  • 
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A.l  Electromagnetic  routines 

A. 1.1  Subroutine  AMPERE 

External  Routines 

EXPERT 

Intrinsic  Functions 

MAX 
MOD 

Arguments  on  Entry 

C  Real  array 

H  Rtal  array 

KBLAS  Integer  array 


contra  variant  current 
covariant  magnetic  intensity 
block  addressing  structure 


Inputs  Through  Common 


Common  Block  /COMADP/ 


MINCO  Integer 
MODKEY  Integer 
MOFSET  Integer 
MSIZO  Integer 

MSPACE  Integer 
MXPDIM  Integer 


Mesh  INCrement  Origin  in  BLAS 

Mesh  orthogonality  and  dimension  key  location 

Mesh  OFfSET  location  in  BLAS 

Mesh  SIZe  Origin  in  BLAS 

Mesh  SPACE  reserved  location  in  BLAS 

Max  number  of  physical  dimensions!  =3) 


Common  Block  /COMDDP/ 
NL0MT2  Logical  array 


Arguments  of  Called  Routines 

ICLASS  Integer  argument  r>f  EXPERT 

ISUB  Integer  argument  of  EXPERT 
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Arguments  on  Exit 

DDOT  Real  array  contravariant  displacement  current 

Outputs  Through  Common 

Common  Block  /COMADP/ 

NONE 

Common  Block  /COMDDP/ 

NONE 
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A. 1.2  Subroutine  FARADA 
External  Routines 
EXPERT 

Intrinsic  Functions 

MAX 

MOD 

Arguments  on  Entry 

B  Real  array  scaled  contra  variant  magnetic  field 

E  Rial  array  covariant  electric  field 

KB  LAS  Integer  array  block  addressing  struct  tin* 


Inputs  Through  Common 


Common  Block  /COMADP/ 


MINCO  Integer 
MODKEY  Integer 
MOFSET  Integer 
MSIZO  Integer 

MSPACE  Integer 
MXPDIM  Integer 


Mesh  I.NCrement  Origin  in  BI.AS 

Mesh  orthogonality  and  dimension  key  location 

Mesh  OFfSET  location  in  BI.AS 

Mesh  SIZe  Origin  in  BLAS 

Mesh  SPACE  reserved  location  in  BLAS 

Max  number  of  physical  dimensions!  =3) 


Common  Block  /COMDDP/ 
NLOMT2  Logical  array 


Arguments  of  Called  Routines 

ICLASS  Integer  argument  of  EXPERT 
ISUB  Integer  argument  of  EXPERT 

Arguments  on  Exit 

B  Real  array  scaled  contravariant  magnetic  field 
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Outputs  Through  Common 

Common  Block  /COMADP/ 
NONE 

Common  Block  /COMDDP/ 
NONE 
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A. 1.3  Subroutine  GBTOH 

External  Routines 

EXPERT 


Intrinsic  Functions 

MAX 

MOD 


Arguments  on  Entry 


B 

GBH 

KBLAS 

KGBHAD 


Real  array 
Real  array 
Integer  array 
Integer  array 


scaled  contravariant  magnetic  field 
b'  to  H,  conversion  tensor 
block  addressing  structure 
GBH  addressing  structure 


Inputs  Through  Common 


Common  Block  / COMADP/ 


MINCO  Integer 
MINCOG  Integer 
MODKEY  Integer 
MOFSET  Integer 
MSIZO  Integer 

MSPACE  Integer 
MXPDIM  Integer 


Mesh  IXCrement  Origin  in  BLAS 

Mesh  INC'reinent  Origin  in  G  addressing 

Mesh  orthogonality  and  dimension  key  location 

Mesh  OFFSET  location  in  BLAS 

Mesh  SIZe  Origin  in  BLAS 

Mesh  SPACE  reserved  location  in  BLAS 

Max  number  of  physical  dimensions!  =3) 


Common  Block  /COMDDP/ 
NL0MT2  Logical  array 


Arguments  of  Called  Routines 

ICLASS  Integer  argument  of  EXPERT 

ISUB  Integer  argument  of  EXPERT 
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P 

I 


Arguments  on  Exit 

H  Real  array  covariant  magnetic  intensity 

Outputs  Through  Common 

Common  Block  /COMADP/ 

NONE 

Common  Block  /COMDDP/ 

NONE 
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A. 1.4  Subroutine  GDTOE 

External  Routines 

EXPERT 

Intrinsic  Functions 

MAX 

MOD 


Arguments  on  Entry 


D 

GED 

KBLAS 

KGEDAD 


Real  array 
R(al  array 
Integer  array 
Integer  array 


scaled  contravariant  displacement  field 
d'  to  E,  conversion  tensor 
block  addressing  structure 
GED  addressing  structure 


Inputs  Through  Common 


Common  Block  /COMADP/ 


MINCO  Integer 
MINCOG  Integer 
MODKEY  Integer 
MOFSET  Integer 
MORIGG  Integer 
MSIZO  Integer 

MSPACE  Integer 
MXPDIM  Integer 


Mesh  INC’rement  Origin  in  BLAS 

Mesh  INC'rement  Origin  in  G  addressing 

Mesh  orthogonality  and  dimension  kev  location 

Mesh  OFfSET  location  in  BLAS 

Mesh  ORIGin  in  G  addressing 

Mesh  SIZe  Origin  in  BLAS 

Mesh  SPACE  reserved  location  in  BLAS 

Max  number  of  physical  dimensions!  =3) 


Common  Block  /COMDDP/ 
NL0MT2  Logical  array 


Arguments  of  Called  Routines 

ICLASS  Integer  argument  of  EXPERT 

ISUB  Integer  argument  of  EXPERT 
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Arguments  on  Exit  • 

B  Real  array  scaled  contra  variant  magnetic  field 

Outputs  Through  Common  ^ 

Common  Block  /COMADP/ 

NONE 

Common  Block  /COMDDP/ 

NONE 
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A. 1.5  Subroutine  NILVEC 

Externa]  Routines 

EXPERT 


Intrinsic  Functions 

MAX 

MOD 


Arguments  on  Entry 

KBLAS  Integer  array  block  addressing  structure 


Inputs  Through  Common 


Common  Block  /COMADP/ 


MINCO  Integer 

MODKEY  Integer 
MOFSET  Integer 
MORIGG  Integer 
MSIZO  Integer 

MSPACE  Integer 
MXPDIM  Integer 


Mesh  INC’rement  Origin  in  BLAS 

Mesh  orthogonality  and  dimension  key  location 

Mesh  OFfSET  location  in  BLAS 

Mesh  ORICJin  in  G  addressing 

Mesh  SIZe  Origin  in  BLAS 

Mesh  SPACE  reserved  location  in  BLAS 

Max  number  of  physical  dimensions(  =3) 


Common  Block  /COMDDP/ 
NLOMT1  Logical  array 


Arguments  of  Called  Routines 

ICLASS  Integer  argument  of  EXPERT 

ISUB  Integer  argument  of  EXPERT 

Arguments  on  Exit 

PV  Real  vector  field  set  to  zero 
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Outputs  Through  Common 


Common  Block  /COMADP/ 
NONE 


Common  Block  /COMDDP/ 
NONE 


32 


J  W  Eastwood.  R  W  Hockney  and  W  Arter 


RFFX(93)56 


A.  1.6  Subroutine  SETVEC 
External  Routines 
EXPERT 

Intrinsic  Functions 

MAX 

MOD 

Arguments  on  Entry 

KBLAS  Integer  array  block  addressing  structure 

KCASE  Integer  =1  for  d  and  =2  for  b  fields 

PVALS  Real  array  3-  vector  of  values  to  which  PV  is  set 


Inputs  Through  Common 


Common  Block  /COMADP/ 


MINCO  Integer 
MODKEY  Integer 
MOFSET  Integer 
MSIZO  Integer 

MSPACE  Integer 
MXPDIM  Integer 


Mesh  INC’renient  Origin  in  BI.AS 

Mesh  orthogonality  and  dimension  key  location 

Mesh  OFfSKT  location  in  BLAS 

Mesh  SIZe  Origin  in  BLAS 

Mesh  SPACE  reserved  location  in  BLAS 

Max  number  of  physical  dimensions(  =3) 


Common  Block  /COMDDP/ 
NL0MT1  Logical  array 


Arguments  of  Called  Routines 

ICLASS  Integer  argument  of  EXPERT 
ISUB  Integer  argument  of  EXPERT 

Arguments  on  Exit 

PV  Real  array  vector  to  be  set  to  PVALS 
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Outputs  Through  Common 

Common  Block  /COMADP/ 

NONE 

Common  Block  /COMDDP/ 

NONE 
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A. 1.7  Subroutine  ADDVEC 
External  Routines 
EXPERT 

Intrinsic  Functions 

MAX 

MOD 

Arguments  on  Entry 

KBLAS  Integer  array  block  addressing  structure 

PV1  Real  array  input  vector  1 

PV2  Rtal  array  input  vector  2 


Inputs  Through  Common 


Common  Block  /COMADP/ 


MINCO  Integer 
MODKEY  Integer 
MOFSET  Integer 
MSIZO  Integer 

MS  PACE  Integer 
MXPDIM  Integer 


Mesh  INOrement  Origin  in  BIAS 

Mesh  orthogonality  and  dimension  key  location 

Mesh  OFfSET  location  in  BLAS 

Mesh  SIZe  Origin  in  BLAS 

Mesh  SPACE  reserved  location  in  BLAS 

Max  number  of  physical  dimensions)  =3) 


Common  Block  /COMDDP/ 
NL0MT2  Logical  array 


Arguments  of  Called  Routines 

ICLASS  Integer  argument  of  EXPERT 

ISUB  Integer  argument  of  EXPERT 

Arguments  on  Exit 

PV1  Real  array  output  vector  PYl  :=  PYl  +  PV2 
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Outputs  Through  Common 

Common  Block  /COMADP/ 

NONE 

Common  Block  /COMDDP/ 

NONE 
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A. 1.8  Subroutine  CPYVEC  • 

External  Routines 

EXPERT 


Intrinsic  Functions 

MAX 

MOD 


Arguments  on  Entry 

KBLAS  Intiger  array  block  addressing  structure 

PV2  Real  array  input  vector 


Inputs  Through  Common 


Common  Block  /COMADP/ 


MINCO  Integer 
MODKEY  Integer 
MOFSET  Integer 
MSIZO  Integer 

MSPACE  Integer 
MXPDIM  Integer 


Mesh  INCrement  Origin  in  BLAS 

Mesh  othogonalitv  and  dimension  key  location 

Mesh  OFfSET  location  in  BLAS 

Mesh  SIZe  Origin  in  BLAS 

Mesh  SPACE  reserved  location  in  BLAS 

Max  number  of  physical  dimensions!  =3) 


Common  Block  /COMDDP/ 
NL0MT2  Logical  array 


Arguments  of  Called  Routines 

ICLASS  Integer  argument  of  EXPERT 

ISUB  Integer  argument  of  EXPERT 


Arguments  on  Exit 

PV1  Real  array  output  vector  set  to  PV2 
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Outputs  Through  Common 

Common  Block  /COMADP/ 

NONE 

Common  Block  /COMDDP/ 

NONE 
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A.  1.9  Subroutine  AVEVEC 
External  Routines 
EXPERT 

Intrinsic  Functions 

MAX 

MOD 

Arguments  on  Entry 

KBLAS  Integer  array  block  addressing  structure 

PV1  Real  array  input  vector  1 

PV2  Real  array  input  vector  2 

Inputs  Through  Common 

Common  Block  /COMADP/ 

MINCO  Integer  Mesh  INC’reinent  Origin  in  BLAS 

MODKEY  Integer  Mesh  orthogonality  and  dimension  key  location 

MOFSET  Integer  Mesh  OFfSET  location  in  BLAS 

MSIZO  Integer  Mesh  SIZe  Origin  in  BLAS 

MSPACE  Integer  Mesh  SPACE  reserved  location  in  BLAS 

MXPDIM  Integer  Max  number  of  physical  dimensions!  =3) 

Common  Block  /COMDDP/ 

NL0MT2  Logical  array 

Arguments  of  Called  Routines 

ICIASS  Integer  argument  of  EXPERT 

ISUB  Integer  argument  of  EXPERT 

Arguments  on  Exit 

PV1  Real  array  output  PV1  :=  (PV1  +  PV2)/2 
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Outputs  Through  Common 

Common  Block  /COMADP/ 
NONE 

Common  Block  /COMDDP/ 
NONE 
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A.2  Electromagnetic  boundary  condition  routines 
A. 2.1  Subroutine  BCOPAT 
External  Routines 

EXPERT 

MESAGE 


Intrinsic  Functions 

MAX 

MOD 


Arguments  on  Entry 

D  Real  array 

DDOT  Real  array 

KBLAS  Integer  array 

KO  Integer  array 

KPBCAT  Integer 

KTYPE  Integer 

KX  Integer  array 

PATRIB  Real  array 


displacement  field 

displacement  current 

block  addressing  structure 

location  of  patch  origin 

patch  boundary  condition  attribute  pointer 

patch  type 

location  of  patch  extreme 
patch  attribute  table 


Inputs  Through  Common 


Common  Block  /COMADP/ 


MBAINC  Integer 
MINCO  Integer 

MODKEY  Integer 
MOFSET  Integer 
MSIZO  Integer 

MSPACE  Integer 
MXPDIM  Integer 


be  attribute  table  step 

Mesh  INCrement  Origin  in  BLAS 

Mesh  orthogonality  and  dimension  key  location 

Mesh  OFfSET  location  in  BLAS 

Mesh  SIZe  Origin  in  BLAS 

Mesh  SPACE  reserved  location  in  BLAS 

Max  number  of  physical  dimensions(=3) 


Common  Block  /COMDDP/ 
NL0MT2  Logical  array 
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Arguments  of  Called  Routines 

ICLASS  Integer  argument  of  EXPERT 
ISUB  Integer  argument  of  EXPERT 

Arguments  on  Exit 

D  Real  array  D  modified  by  boundary  conditions 

DDOT  Real  array  DDOT  modified  by  boundary  conditions 

Outputs  Through  Common 

Common  Block  /COMADP/ 

NONE 

Common  Block  /COMDDP/ 


NONE 
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A. 2. 2  Subroutine  BCONE 
External  Routines 

EXPERT 

MESAGE 

Intrinsic  Functions 


Arguments  on  Entry 


E 

KBLAS 

KO 

KTYPE 

KX 


Real  array 
Integer  array 
Integer  array 
Integer 
Integer  array 


electric  field 

block  addressing  structure 
location  of  patch  origin 
patch  type 

location  of  patch  extreme 


Inputs  Through  Common 


Common  Block  /COMADP/ 


MINCO  Integer 

MODKEY  Integer 
MOFSET  Integer 
MSIZO  Integer 

MSPACE  Integer 


Mesh  INCrement  Origin  in  BLAS 

Mesh  orthogonality  and  dimension  key  location 

Mesh  OFfSET  location  in  BLAS 

Mesh  SIZe  Origin  in  BLAS 

Mesh  SPACE  reserved  location  in  BLAS 


MXPDIM  Integer  Max  number  of  physical  dimensions(=3) 


Common  Block  /COMDDP/ 
NLOMT2  Logical  array 


Arguments  of  Called  Routines 


ICLASS 

ISUB 


Integer 

Integer 


argument  of  EXPERT 
argument  of  EXPERT 
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Arguments  on  Exit 

E  Real  array  E  modified  by  boundary  conditions 

Outputs  Through  Common 

Common  Block  /COMADP/ 

NONE 

Common  Block  /COMDDP/ 

NONE 
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External  Routines 

EXPERT 
MESAGE 
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Intrinsic  Functions 

ISIGN 

MAX 

MOD 

Arguments  on  Entry 

block  addressing  structure 
select  copy/add  to/from  gluepatch 
location  of  patch  origin 
patch  type 

location  of  patch  extreme 
gluepatch  buffer  array 
block  vector  field  array 

Inputs  Through  Common 


KBLAS 

Integer  array 

KCASE 

Integer 

KO 

Integer  array 

KTYPE 

Integt  r 

KX 

Integer  array 

PATCH 

Real  array 

VEC 

Real  array 

Common  Block  /COMADP/  • 

MINCO  Integer  Mesh  INCrement  Origin  in  BLAS 

MODKEY  Integ  er  Mesh  orthogonality  and  dimension  key  location 

MOFSET  Integer  Mesh  OFfSET  location  in  BLAS 

MSIZO  Integer  Mesh  SIZe  Origin  in  BLAS 

MSPACE  Integer  Mesh  SPACE  reserved  location  in  BLAS 

MXPDIM  Integer  Max  number  of  physical  dimensions(=3) 


Common  Block  /COMDDP/  • 

NL0MT2  Logical  array 
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Arguments  of  Called  Routines  • 

IC1  Integer  argument  of  MAX 

ICLASS  Integer  argument  of  EXPERT 

INGP  Integer  arrajrgument  of  MAX 

ISUB  Integer  argument  of  EXPERT  • 


Arguments  on  Exit 

KLEN  Integer 

PATCH  Real  array 

VEC  Real  array 


Outputs  Through  Common  • 

Common  Block  /COMADP/ 

NONE  # 

Common  Block  /COMDDP/ 

NONE 
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A. 3  Particle  routines  • 

A. 3.1  Subroutine  SETCUR 

External  Routines 
EXPERT 

NILVEC  set  vector  to  zero 

Intrinsic  Functions  ^ 

NONE 


Arguments  on  Entry 

KBLAS  Integer  army  block  addressing  structure 

Inputs  Through  Common 

Common  Block  /COMADP/ 

NONE 

Common  Block  /COMDDP/ 

NL0MT2  Logical  array 


Arguments  of  Called  Routines 


ICLASS  Integer 

ISUB  Integer 

KBLAS  Integer 

PCUR  Real 


argument  of  EXPERT 
argument  of  EXPERT 
argument  of  NILVEC 
argument  of  NILVEC 


Arguments  on  Exit 

PCUR  Real  array  initialised  current  array 


Outputs  Through  Common 
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Common  Block  /COMADP/ 

NONE 

Common  Block  /COMDDP/ 

NONE 
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A. 3.2  Subroutine  MOVCUR 

External  Routines 

ASSCUR  assign  current  to  block  element  net 

EXPERT 

IVAR 

MESAGE 

PBLINK  manage  particle  buffering  between  blocks 


Intrinsic  Functions 

INT 

MOD 

SQRT 


Arguments  on  Entry 

KBLAS  Integer  array 

KCASE  Integer 

KCOVA  Integer  array 

KPART  Integer  array 

PARTAT  Real  array 

PBUF1  Real  array 

PCOORD  Real  array 

PECOV  Real  array 


block  addressing  structure 

0  for  original  move.  >  0  for  exchange  buffer  data 

block  addressing  for  basis  vectors 

particle  addressing  structure 

particle  attribute  table 

input  particle  buffer 

particle  coordinate  array 

basis  vector  array 


Inputs  Through  Common 


Common  Block  /COMADP/ 


MINCO  Integer 
MNMOM  Integer 
MNPOS  Integer 
MOCPS  Integer 
MOFSET  Integer 
MORGTI  Integer 
MPAINC  Integer 
MPNOO  Integer 
MPORO  Integer 
MSIZO  Integer 

MSPACE  Integer 


Mesh  INCrement  Origin  in  BLAS 

loc  of  No  of  particle  MOMentum  coords  in  LPARAS 

loc  of  No  of  particle  POSition  coords  in  LPARAS 

offset  for  Charge  Per  Superparticle  value 

Mesh  OFfSET  location  in  BLAS 

Mesh  ORIGin  of  1/T  in  Ecov  addressing 

particle  attribute  table  step 

Particle  NO  Origin  in  LPARAS 

Particle  Origin  table  Origin  in  LPARAS 

Mesh  SIZe  Origin  in  BLAS 

Mesh  SPACE  reserved  location  in  BLAS 
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MSPEC  Integer  location  of  no  of  SPECies  in  LPARAS 

MXPDIM  Integer  Max  number  of  physical  dimensions(=3) 

NOECA  Integer  no  of  ECOV  addressing  entries  per  component 


Common  Block  /COMBAS/ 
NONE 

Common  Block  /CHABAS/ 

NONE 

Common  Block  /COMDDP/ 
NLOMT2  Logical  array 


Arguments  of  Called  Routines 


ICLASS 

Integer 

argument  of  EXPERT 

IFACEK 

Integer 

argument  of  PBLINK 

INC 

Integer  array  argument  of  ASSCUR 

INXBUF 

Integer 

argument  of  PBLINK 

IOFSET 

Integt  r 

argument  of  ASSCUR 

IOPARO 

Integer 

argument  of  IVAR 

IPOMAX 

Integer 

argument  of  IVAR 

IS  PACE 

Integer 

argument  of  ASSCUR 

ISUB 

Integer 

argument  of  EXPERT 

JPARTO 

Integer 

argument  of  IVAR 

JSPEC 

Infege  r 

argument  of  PBLINK 

NSTEP 

lntege  r 

argument  of  IVAR 

PCUR 

Real 

argument  of  ASSCUR 

ZCMULT 

Rea! 

argument  of  ASSCUR 

ZXN 

Real  array 

argument  of  ASSCUR 

ZXO 

Real  array 

argument  of  ASSCUR 

Arguments  on  Exit 


KPART  Integer  array 

PBUFO  Real  array 

PCOORD  Real  array 

PCUR  Real  array 


particle  addressing  structure 
output  particle  buffer 
coordinate  array 
current  array 
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Outputs  Through  Common 

Common  Block  /COMADP/ 

NONE 

Common  Block  /COMBAS/ 

NONE 

Common  Block  /CHABAS/ 

NONE 

Common  Block  /COMDDP/ 

NONE 

Common  Block  /XXX/ 

NONE 

• 
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Subroutine  ASSCUR 

External  Routines 

JOIN  GHOST  routine  (for  testing) 

LINCOL  GHOST  routine  (for  testing)  trajectory  element 
MERGE  merge  ordered  list  of  intersections 

PICNOW  GHOST  routine  (for  testing) 

POSITN  GHOST  routine  (for  testing) 

Intrinsic  Functions 

INTS 

MOD 


Arguments  on  Entry 

KINC  Integer  array 

KOFSET  Integer 

KSPACE  Integer 

PCMULT  Real 

PCUR  Real  array 

PXN  Real  array 

PXO  Real  array 

Inputs  Through  Common 

Common  Block  /COMIBC/ 

NODIM  Integer  dimensionality 

NOEL1  Integer  array  no  of  elements  in  block  type/side 

XLEN1  Real  array  length  of  side  of  block  type 

Common  Block  /COMGMA/ 

XYZBLK  Real  array  global  reference  coordinate  of  block 


Common  Block  /COMADP/ 

MXPDIM  Integer  Max  number  of  physical  dimensions(=3) 


block  mesh  increment 

block  mesh  offset 

block  mesh  space  per  component 

current/particle  multiplier 

block  current  array 

end  position  of  virtual  particle 

start  position  of  virtual  particle 
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Common  Block  /XXX/ 

IBLOCK  Integer  current  block  number  (for  testing) 


Arguments  of  Called  Routines 


ILEN 

Integer 

argument  of  MERGE 

ILEN23 

Integer 

argument  of  MERGE 

INX 

Integer  array 

argument  of  MERGE 

IOR1 

Integer 

argument  of  MERGE 

IOR2 

Integer 

argument  of  MERGE 

I0R3 

Integer 

argument  of  MERGE 

I0R4 

Integer 

argument  of  MERGE 

ZALFA 

Real  an-ay 

argument  of  MERGE 

ZXN 

Real 

argument  of  JOIN 

ZXO 

Real 

argument  of  POSITN 

ZYN 

Real 

argument  of  JOIN 

ZYO 

Real 

argument  of  POSITN 

Arguments  on  Exit 

PCUR  Real  array  nodal  contravariant  currents 

Outputs  Through  Common 

Common  Block  /COMIBC/ 

NONE 

Common  Block  /COMGMA/ 

NONE 

Common  Block  /COMADP / 

NONE 

Common  Block  /XXX/ 
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Subroutine  MERGE 
Externa]  Routines 
NONE 


Intrinsic  Functions 
NONE 


Arguments  on  Entry 


KA 

Integer 

length  of  list  a 

KB 

Integer 

length  of  list  b 

PA 

Real  array 

list  of  increasing  numbers  a 

PB 

Real  array 

list  of  increasing  numbers  b 

Inputs  Through  Common 

NONE 


Arguments  of  Called  Routines 

NONE  • 

Arguments  on  Exit 

KC  Integ  f.r  length  of  merged  list  c  ^ 

PC  Real  ordered  merged  lists  a  and  b 

Outputs  Through  Common 

NONE  • 
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A. 3.3  Subroutine  ACCEL 
External  Routines 
EXPERT 


Intrinsic  Functions 

MOD 
SQRT 

Arguments  on  Entry 

B  Real  array 

E  Real  array 

KBLAS  Integer  array 

KCOVA  Integer  array 

KPART  Integer  array 

PARTAT  Real  array 
PCOORD  Real  array 
PECOV  Real  array 

Inputs  Through  Common 

Common  Block  /COMADP/ 

MINCO  Integer  Mesh  INCrement  Origin  in  BIAS 

MNMOM  Integer  loc  of  No  of  particle  MOMentum  coords  in  LPARAS 

MNPOS  Integer  loc  of  No  of  particle  POSition  coords  in  LPARAS 

MOCPS  Integer  offset  for  Charge  Per  Superparticle  value 

MOFSET  Integer  Mesh  OFfSET  location  in  BLAS 

MORGTI  Integer  Mesh  ORIGin  of  1/T  in  Ecov  addressing 

MORIGT  Integer  Mesh  ORIGin  of  T  in  Ecov  addressing 

MPAINC  Integer  particle  attribute  table  step 

MPNOO  Integer  Particle  NO  Origin  in  LPARAS 

MPORO  Integer  Particle  Origin  table  Origin  in  LPARAS 

MSIZO  Integer  Mesh  SIZe  Origin  in  BLAS 

MSPACE  Integer  Mesh  SPACE  reserved  location  in  BLAS 

MSPEC  Integer  location  of  no  of  SPECies  in  LPARAS 

MXPDIM  Integer  Max  number  of  physical  dimensions(=3) 

NOECA  Integer  no  of  ECOV  addressing  entries  per  component 


scaled  contravariant  magnetic  field 
covariant  electric  field 
block  addressing  structure 
block  addressing  for  basis  vectors 
particle  addressing  structure 
particle  attribute  table 
particle  coordinate  arras- 
basis  sector  array 
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NOECC  Integer  no  of  ECOV  components  per  block 

Common  Block  /COM  8 AS/ 

NONE 

Common  Block  /CHABAS/ 

NONE 

Common  Block  /COMDDP/ 

NL0MT2  Logical  array 

Arguments  of  Called  Routines 

ICLASS  Integer  argument  of  EXPERT 
ISUB  Integer  argument  of  EXPERT 

Arguments  on  Exit 

PCOORD  Real  array  particle  coordinate  array 

Outputs  Through  Common 

Common  Block  /COMADP/ 

NONE 

Common  Block  /COM BAS/ 

NONE 

Common  Block  /CHABAS/ 

NONE 

Common  Block  /COMDDP/ 

NONE 

56 


Arguments  on  Entry 
D 

KBLAS 
KO 

KPART 
KX 

PCHG 
PCPP 


Inputs  Through  Common 


Common  Block  COMADP  // 

MINCO  Integer  Mesh  INCrement  Origin  in  BLAS 

MNMOM  Integer  loc  of  No  of  particle  MOMentum  coords  in  LPARAS 

MNPOS  Integer  loc  of  No  of  particle  POSition  coords  in  LPARAS 

MODKEY  Integer  Mesh  orthogonality  and  dimension  key  location 

MOFSET  Integer  Mesh  OFfSET  location  in  BLAS 

MSIZO  Integer  Mesh  SIZe  Origin  in  BLAS 

MSPACE  Integer  Mesh  SPACE  reserved  location  in  BLAS 

MSPEC  Integer  location  of  no  of  SPECies  in  LPARAS 

MXPDIM  Integer  Max  number  of  physical  dimensions!  =3) 
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Real  array  scaled  contravariant  displacement  field 

Integer  array  block  addressing  structure 

Integer  array  location  of  patch  origin 

Integer  array  particle  addressing  structure 

Integer  array  location  of  patch  extreme 

Real  array  charge  density 

Real  charge  per  superparticle 
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Common  Block  /COMBAS/ 

NONE 

Common  Block  /COMDDP/ 

NLOMT2  Logical  array 

Arguments  of  Called  Routines 

ICLASS  Integer  argument  of  EXPERT 
ISUB  Integer  argument  of  EXPERT 


Arguments  on  Exit 

KLENO  Integer  length  of  PBUFO 

PBUFO  Real  array  gluepatch  buffer  containing  new  particles 

PCHG  Real  array  modified  PCHG  at  cathode  surface 

Outputs  Through  Common 

Common  Block  /COMADP/ 

NONE 


Common  Block  /COMBAS/ 
NONE 

Common  Block  /COMDDP/ 
NONE 
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A.4.2  Subroutine  QSHARE 
External  Routines 
EXPERT 

Intrinsic  Functions 

MOD 


Arguments  on  Entry 

KBLAS  Integer  array  block  addressing  structure 

KPART  Integer  array  particle  addressing  structure 

PARTAT  Real  array  particle  attribute  table 

PCHG  Real  array  charge  density 

PCOORD  Real  array  particle  coordinate  array 


Inputs  Through  Common 
Common  Block  /COMADP/ 

MINCO  Integer  Mesh  INCrement  Origin  in  BLAS 

MNMOM  Integer  loc  of  No  of  particle  MOMentum  coords  in  LPARAS 

MNPOS  Integer  loc  of  No  of  particle  POSition  coords  in  LPARAS 

MOCPS  Integer  offset  for  Charge  Per  Superparticle  value 

MOFSET  Integer  Mesh  OFfSET  location  in  BLAS 

MPAINC  Integer  particle  attribute  table  step 

MPNOO  Integer  Particle  NO  Origin  in  LPARAS 

MPORO  Integer  Particle  Origin  table  Origin  in  LPARAS 

MSIZO  Integer  Mesh  SIZe  Origin  in  BLAS 

MSPACE  Integer  Mesh  SPACE  reserved  location  in  BLAS 

MSPEC  Integer  location  of  no  of  SPECies  in  LPARAS 

MXPDIM  Integer  Max  number  of  physical  dimensions(=3) 


Common  Block  /COMBAS/ 


NONE 
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Common  Block  /CHABAS/ 

NONE 

Common  Block  /COMDDP/ 

NL0MT2  Logical  array 

Arguments  of  Called  Routines 

ICLASS  Integer  argument  of  EXPERT 
ISUB  Integer  argument  of  EXPERT 

Arguments  on  Exit 

PCHG  Real  array  charge  density 

Outputs  Through  Common 

Common  Block  /COMADP/ 

NONE 

Common  Block  /COM BAS/ 

NONE 

Common  Block  /CHABAS/ 

NONE 

Common  Block  /COMDDP/ 

NONE 
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ABSTRACT 
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A  new  relativistic  electromagnetic  PIC  algorithm  for  general  two  and  three  di¬ 
mensional  geometries  is  described.  Correct  choice  of  co-  and  contravariant  field 
components  and  of  weighting  yields  simple  coordinate  invariant  numerical  pre¬ 
scriptions.  The  combination  of  isoparametric  hexahedral  elements,  generated 
by  transfinite  interpolation,  and  multiblock  decomposition  leads  to  algorithms 
ideally  suited  to  MIMD  computers. 
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Chapter  1 
Introduction 


1.1  Contents 

Virtual  Particle  (VP)  particle- mesh  algorithms  are  now  established  as  an  ef¬ 
fective  approach  to  obtaining  numerical  schemes  for  solving  the  relativistic 
Maxwell- Vlasov  equations  (6,  7,  8,  2,  1,  9].  Unlike  conventional  Particle-in- 
Cell  (PIC)  schemes,  they  are  derived  using  finite  elements  in  both  space  and 
time.  Current  is  assigned  from  ‘virtual  particles’  placed  at  points  specially 
interpolated  between  positions  at  successive  time  levels,  a  procedure  which 
automatically  leads  to  charge  conservation.  Existing  VP  implementations  use 
rectangular  finite  elements  in  two  dimensional  cartesian  and  polar  geometries. 
Only  a  restricted  class  of  device  is  well  modelled  in  such  circumstances,  lead¬ 
ing  to  the  need  to  implement  VP  in  more  complex  geometries.  This  report 
shows  how  the  VP  algorithms  extend  to  general  three  dimensional  body-fitted 
elements  [1],  The  resulting  multiblock  decomposition  of  both  field  and  particle 
data  allow  efficient  usage  of  Distributed  Memory  MIMD  architecture  comput¬ 
ers. 

The  report  has  been  broken  down  into  a  number  of  largely  self  contained 
chapters,  the  contents  of  which  are  as  follows: 

Chapter  2  contains  definitions  and  identities  in  general  curvilinear  coor¬ 
dinates  used  in  the  derivation  of  the  equations  for  the  isoparametric  finite 
element  particle-mesh  schemes. 

Chapter  S  describes  the  method  of  dividing  complex  objects  into  a  set  of 
curvilinear  hexahedral  blocks,  and  of  subdividing  those  blocks  into  hexahedral 
finite  elements  using  transfinite  interpolation.  Metric  tensors  and  basis  vec¬ 
tors  are  defined  by  using  coordinate  transformations  isometric  to  the  electric 
potential  representation. 

Chapter  4  outlines  a  number  of  alternative  Virtual  Particle  schemes  for  the 
solution  of  the  electromagnetic  field  equations  and  summarises  the  time  step 
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loop  for  the  chosen  variant. 

Chapter  5  presents  the  Virtual  Particle  charge  and  current  assignment 
schemes  in  general  curvilinear  coordinates  for  the  linear  basis  function  finite 
elements  discussed  in  Chapter  4  for  Maxwell’s  equations. 

Chapter  6  describes  the  treatment  of  boundary  conditions. 

Chapter  7  outlines  the  planned  data  organisation  in  the  MIMD  implemen¬ 
tation  of  the  algorithm. 


1.2  Acknowledgements 

This  report  was  prepared  as  part  of  a  research  project  sponsored  by  the  Air 
Force  Office  of  Scientific  Research  (AFSC)  under  Contract  F49620-92-C-0035, 
“General  Geometry  PIC  Algorithms  and  Software  for  Distributed  Memory 
MIMD  Computers”.  A  substantial  part  of  the  material  contained  in  this  re¬ 
port  was  developed  from  research  undertaken  during  earlier  research  projects 
ABR  40  451  “Electromagnetic  Modelling”  sponsored  by  AEA  Corporate  Re¬ 
search  and  RAE  IB/7  “3-D  Modelling  of  HPM  Sources”  sponsored  by  RAE 
Farnborough. 


Chapter  2 

Tensor  Definitions 


This  Chapter  contains  definitions  and  identities  in  general  curvi¬ 
linear  coordinates  used  in  the  derivation  of  the  equations  for  the 
isoparametric  finite  element  particle-mesh  schemes. 


2.1  Basis  Vectors  and  Components 

Let  the  reference  cartesian  coordinates  have  components  (x1,  x2,  x3)  and  unit 
vectors  (xj,  x2,  x3),  so  that  a  vector,  x,  may  be  written 

x  =  x’x,  (2.1.1) 


where  summation  over  the  repeated  index  i  (=  1,  2,  3)  is  implied.  Now  suppose 
that  (x1,  x2,  x3)  are  expressed  as  functions  of  the  curvilinear  coordinate 
components  (x1,  x2,  x3),  i.e. 


x1  =  x1  (x1,  xJ,  x3),  etc, 

(2.1.2) 

or  more  concisely 

X  =  x(x) 

(2.1.3) 

We  define  the  basis  vectors 

dx 

*'  “  aF 

(2.1.4) 

and  the  reciprocal  basis  vectors 

e’  =  Vx’ 

(2.1.5) 

A  vector  A  may  be  expressed  in  terms  of  its  contravariant  component  A'  or 
covariant  components  A,:- 


A  =  A'e,  :  contravariant 


(2.1.6) 


=  A,e'  :  covariant 


(2.1.7) 
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Note  that  in  general  e,  and  e‘  are  not  unit  vectors.  If  we  introduce  the  unit 
vectors. 

ei  =  ri  (2.1.8) 


then  (2.1.6)  may  be  written 

A  =  (yrie.De,  =  A(t)e.  (2.1.9) 

where  A(t)  are  the  physical  components. 

The  basis  vectors  and  reciprocal  basis  vectors  are  orthogonal 

eiei  =  6i  (2.1.10) 

where  6'}  is  the  Kronecker  delta  (=  1  if »  =  j,  0  otherwise). 

The  reciprocal  basis  vectors  can  be  written  in  terms  of  the  basis  vectors 


ej  x  et  _  e,  x  ek 
|e»  •  e,  x  e*|  ~  J 


(2.1.11) 


and  vice-versa 

e,  =  (eJ  x  e*)J  (2.1.12) 

where  the  Jacobian 

J  =  V?  =  |e<  ■  *j  x  et|  (2.1.13) 

can  be  written  in  terms  of  the  square  root  of  the  determinant,  g,  of  the  metric 
tensor. 

The  (covariant)  metric  tensor  is  defined  as 


9ij  -  e< '  ej 

and  its  reciprocal  tensor,  the  contravariant  metric  tensor  is 


It  follows  that 


gij  =  e'  ■  eJ 

9ik9ki  =  H 
9  =  llftll  =  J2 
A'  =  gij  A, 
Ai  =  gn  A* 


(2.1.14) 


(2.1.15) 


(2.1.16) 

(2.1.17) 

(2.1.18) 
(2.1.19) 


In  cartesian  coordinates,  the  covariant,  contravariant  and  physical  components 
are  identical  and  the  metric  tensor  reduces  to  the  g,}  =  6i}.  For  orthogonal 
systems,  gxj  —  0  for  i  ^  j.  In  general  gi}  is  symmetric  with  six  distinct 
elements,  gi}  =  gj,. 
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A  vector  A  is  independent  of  the  coordinate  system  it  is  represented  in.  Thus 
if  A  has  contravariant  components  A',  coordinates  x',  and  bases  e,  in  one 
system,  and  A\  x\  e,  in  another,  then 


Dotting  with  e'  gives 


A  =  A'e,  =  A'e, 


A 3  =  A'e,  ■  e3 


(2.2.1) 


(2.2.2) 


Similarly 


-  •  dx3 

A’  =  Aw 


A  -  A—— 
A)  ~  A,dx> 


Tensors  transform  similarly.  So  for  example 

d  _  d*k  d*‘  f, 

B'3  dx'dx>Bkl 


(2.2.3) 


(2.2.4) 


(2.2.5) 


where  Bi}  and  Bki  are  rank  2  tensors  in  the  unbarred  and  barred  coordinate 
systems,  respectively. 

From  Eq.(2.2.5),  it  follows  that  if  we  have  x  =  x(x),  where  x  are  cartesian, 
and  given  that  in  cartesians,  gtJ  =  6,j,  then  in  the  barred  coordinates  the 
metric  tensor  is 

dx'  dxJ  c  dx'  dx' 

Sk‘  ~  d^dii6,3  ~  dl^dP  (2,2X) 

which  is  the  same  as  given  by  Eq.(2.1.14). 

Volume  elements  transform  in  the  usual  fashion: 


dr  =  dx 1  dx 1  dx3  =  J  dx 1  dx 2  dx3 


where 


J  =  y/g  = 


2.3  Vector  Identities 


(2.2.7) 


a',ei  :  contravariant  components  and  bases 

di,e'  :  covariant  components  and  bases 
gij  :  metric  tensor,  g  =  ||#,|| 
p  :  scalar 
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ei]k 

&ijk 


} 


permutation  symbols:  =  1  for  cyclic  indices,  -1  for 
anticyclic  and  0  for  repeated  indices 

(2.3.1) 

:  permutation  tensor  =  e'}k /  ^fg 

(2.3.2) 

Cijk  ■  permutation  tensor  =  \fgetJk 

(2.3.3) 

a  •  b  =  a'bi  =  a,6' 

(2.3.4) 

(a  x  b)‘  =  tiika}bk 

(2.3.5) 

(a  x  b)i  =  e,]ka1bk 

(2.3.6) 

™  -  % 

(2.3.7) 

(2.3.8) 

o 

ill 

> 

(2.3.9) 

(2.3.10) 

_  .  d 

(2.3.11) 

vlf> * jjW  {s“^) 

(2.3.12) 

(2.3.13) 

2.4  Derivatives  of  Vectors 


Given  some  vector  A,  then  its  derivative  with  respect  to  some  cartesian  com¬ 
ponent  x'  is 

(2-41) 

In  general  (barred)  coordinates,  the  covariant  derivative  is  likewise  defined, 

(2.4.2) 


but  now  the  basis  vectors  are  no  longer  independent  of  the  coordinates.  From 

(2.4.2) 

a  Ai  (  a  1 

(2.4.3) 


*  =  w + A‘  {*'  •  ss**} 


The  term  is  the  curly  brackets  in  Eq.(2.4.3)  is  the  Christoffel  symbol  of  the 
first  kind.  From  Eqs.(2.1.4)  and  (2.1.5)  we  have 


i  dek  dx'  d2xl  _ 
dxi  dx1  dx]dxk  *k 


(2.4.4) 
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The  ChristofFel  symbols  are  not  tensors,  and  they  transform  between  two  gen¬ 
eral  coordinate  systems  according  to 


dx'  dxp  dxq  dx'  d2xl 
l>k~l”d^diJdxk  +  dx1  dHdxk 


(2.4.5) 


Repeating  the  above  argument  for  covariant  vector  components  we  have 


(2.4.6) 


-  dAi  *  -*  de< 

dxi  Ak  '  dx> 

—  ~  —  r*  At 

~  dx>  l'jAk 


(2.4.7) 


(2.4.8) 


To  obtain  (2.4.7),  we  used  e*  •  e<  =  6k  and  differentiated  term  by  term. 

The  ChristofFel  symbols  arise  when  we  wish  to  write  terms  such  as  a  •  Vb 
in  tensor  form,  or  in  the  absolute  (or  intrinsic)  time  derivative  of  a  vector. 

The  covariant  components  of  the  absolute  time  derivative  of  vector  A  are 
defined  as 

6AP  dA  d 

-sr  =  '’-jr=€’j,'A' 

dA,  I  8e’  1  dx1 

~  dl  +/1' ('”  '«*«/  it 

=  ijx--r»A’ir  l2-4-9» 

The  contravariant  components  are  likewise  defined 

It  ~  dt  ~ *  dt  ’A 


dA p  „  irdx« 
=  -7-  +  nrAr— 
dt  ,r  dt 


(2.4.10) 


2.5  Maxwell’s  Equations 


Using  the  formulae  of  Section  2.3  we  can  write  Maxwell’s  equations  in  tensor 
form:- 
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y/gB'  =  0 
y/gdx'v 


dD' 

dt 


- _  .< 
da  3 


(2.5.2) 

(2.5.3) 

(2.5.4) 


where  and  Ek  are  magnetic  and  electric  field  components,  D'  are  electric 
displacement  components  and  Hk  are  magnetic  intensity  components.  Quan¬ 
tities  j*  and  p  are  respectively  current  density  and  charge  density. 

It  is  convenient  to  introduce  extensive  current  and  charge  variables 

/*  =  sfgj'  (2.5.5) 

Q  =  V9P  (2-5.6) 

and  volume  scaled  flux  quantities  b'  and  for  magnetic  and  displacement 
fields 

6'  =  J gB ’  (2.5.7) 

<T  =  y/gW  (2.5.8) 

If  in  addition  we  write  the  permutation  tensor  in  terms  of  the  permutation 
symbol,  we  can  write  Maxwell’s  equations  as 

W 

dt 


dt_ 

dt 


Equations  (2.5.9)  -  (2.5.12)  have  the  particularly  attractive  feature  that  they 
have  the  same  form  irrespective  of  coordinate  system,  and  contain  no  explicit 
reference  to  the  metrics.  This  is  also  true  of  the  relationships  between  the 
electromagnetic  fields  and  potentials:- 


!  dx3 

(2.5.9) 

db' 

dr 

(2.5.10) 

-  r 

~  dx3 

(2.5.11) 

djt =Q 
d?  v 

(2.5.12) 

dd> 

Ek  =  ~l£-Ak 

b'  =  -e'lk~ 
dx3 


(2.5.13) 

(2.5.14) 


The  only  explicit  reference  to  the  metrics  appears  in  the  constitutive  relation¬ 
ships  between  fields,  which  in  vacuo  are 


PoHk  = 

V9 


(2.5.15) 


i 
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«o  Ek  =  ^pd1  (2.5.16) 

V9 

More  generally,  no  and  to  can  be  replaced  by  tensor  permeabilities  and  per¬ 
mittivities. 

A  similar  attractive  simplicity  appears  in  the  Action  Integral  for  the  elec¬ 
tromagnetic  field  equations:- 

/  =  J  dt  dx 1  dx2  dx3  -  HiF)  +  I' A,  -  Q<t>}  (2.5.17) 

The  metric  free  forms  simplifies  the  problem  of  writing  a  program  module  for 
solving  Maxwell’s  equations  in  arbitrary  non-orthogonal  coordinate  systems. 


2.6  Equations  of  Motion 


The  relativistic  equations  of  motion  of  a  charged  particle  are 


> 

II 

■si* 

(2.6.1) 

dp 

^  =  9(  E  +  vxB) 

(2.6.2) 

• 

where 

p  =  7m0v 

(2.6.3) 

(2.6.4) 

• 

Introducing  general  coordinates,  (x ',  x2,  x3),  i.e. 

X  =  x(i\  X2,  X3) 

(2.6.5) 

gives 

• 

dx  dx  dxk 

dt  dxk  dt 

dxk  l 

=  e*— 7—  =  e*t> 
dt 

(2.6.6) 

• 

Thus,  Eq.(2.6.1)  transforms  to 

*1% 

II 

ft- 

(2.6.7) 

Applying  the  results  of  Eqs.(2.3.3),  (2.3.6),  (2.4.9)  and  (2.5.7)  to  Eq. (2.6.2) 
gives  the  covariant  momentum  equation 


dp. 


-jjf  ~  IVrC’  =  q[E,  +  e„rt>’5r] 


■  • 


t 


(2.6.8) 
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The  particle  dynamics  can  be  included  in  the  action  integral,  Eq.(2.5.17),  by 
adding  the  extra  particle  lagrangian  term, 

(2.6.9) 

and  explicitly  expressing  the  charges  and  currents  in  Eq.(2.5.17)  as  sums  over 
particles,  t,  with  charges  9, 

Q  =  H  vM*)  ~  *‘W*2  -  x2)6(x 3  -  i3)  (2.6.10) 


dr  * 

Im  =  Y,  qt6(x]  -  x')6(x 1  -  x2)6(i3  -  x3)-^-  (2.6.11) 

The  equation  of  motion,  Eq.(2.6.8)  arises  from  the  Euler- Lagrange  equation 


d  r  _d_dL 
dx‘  dt  di‘ 


(2.6.12) 


where  L  is  the  total  Lagrangian  density. 


Chapter  3 

Finite  Element  Generation 


This  Chapter  describes  the  method  of  dividing  complex  objects 
into  a  set  of  curvilinear  hexahedral  blocks,  and  of  subdividing 
those  blocks  into  hexahedral  finite  elements  using  transfinite  in¬ 
terpolation.  Metric  tensors  and  basis  vectors  are  defined  by  using 
coordinate  transformations  isoi..etric  to  the  electric  potential  rep¬ 
resentation. 


3.1  Introduction 

A  two  level  decomposition  of  a  complex  object  into  a  set  of  finite  elements 
is  proposed.  First,  the  object  is  divided  into  a  set  of  curvilinear  hexahedral 
blocks.  In  selecting  this  multiblock  division,  it  is  important  to  choose  the  divi¬ 
sion  of  complex  objects  into  sufficiently  convex  volumes  to  avoid  the  creation 
of  very  small,  or  even  negative,  volume  elements  in  the  blending  process;  this 
could  cause  the  numerical  simulations  to  fail.  However  in  the  applications 
envisaged  it  is  unlikely  that  singular  elements  will  present  serious  difficulties. 
Computational  efficiency  dictates  that  wherever  possible,  orthogonal  blocks 
be  used  (such  as  rectangular  bricks  and  cylinder  sections)  as  this  greatly  re¬ 
duces  computational  costs  and  storage.  For  the  purposes  of  facilitating  load 
balancing  on  MIMD  computers,  it  may  be  advantageous  to  subdivide  blocks 
beyond  the  level  dictated  by  the  object’s  geometry. 

Each  block  of  the  multiblock  decomposition  is  described  by  its  bounding 
curves;  four  intersecting  curves  in  two  dimensions  and  twelve  curves  in  three 
dimensions.  Alternatively,  the  three  dimensional  block  may  be  described  by 
bounding  surfaces.  The  relationship  of  the  block  coordinates  to  the  global 
coordinates  of  the  object  is  described  in  Section  3.5.  The  locations  of  element 
nodes  within  a  block  are  generated  by  blending  the  interpolants  from  the 
bounding  curves  or  surfaces,  as  described  in  Sections  3.3  and  3.4. 

In  order  to  transform  between  curved  space  (barred)  and  physical  coordi¬ 
nates,  and  to  integrate  the  equations  of  motion,  values  of  the  contravariant 
basis  vectors  (Eq.(2.1.4))  are  required.  The  integration  of  the  field  equations 
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requires  metric  tensor  values  (Eq.(2.1.4)).  These  may  be  evaluated  from  the 
finite  element  approximations  to  the  coordinate  transformations,  as  outlined 
in  the  following  section. 


3.2  Isoparametric  Elements 

3.2.1  2-D:  Linear  Quadrilaterals 


Figure  3.1:  The  isoparametric  linear  quadrilateral  element  with  comer  nodes 
Xi . . .  X*  maps  to  the  unit  square  in  the  barred  coordinate  space. 

The  two  dimensional  linear  isoparametric  element  uses  the  same  linear  support 
function  for  the  coordinate  mapping  as  for  the  finite  element  scalar  potential. 

Figure  3.1  shows  a  quadrilateral  element  in  physical  and  barred  coordinate 
space.  A  point  x  is  related  to  the  barred  coordinates  x  =  (xl,x*)  by 

• 

X  =  x,(l  -x')(l  -X7) 

+  XjX^l-X2) 

+  X^f1*2 


+  X4(1-XJ)X2 

(3.2.1) 

• 

Contravariant  bases 

ei  = 

dx 

dxl 

(x2  -  Xi)(l  -  X2)  +  (x3  -  x4)x2 
rs(i  -  x2)  +  rNi2 

(3.2.2) 

• 

e2  = 

=  rw(l  -x^  +  tex1 

(3.2.3) 

Covariant  bases 

e3  =  z 

(3.2.4) 

• 

H 

'e2  x  z\ 

K  J  J 

|  =  [(1  -  X1)^  X  z  +  xlrE  X  z]  jj 

(3.2.5) 
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e2  =  j  =  [(1  -  x2)z  xrs  +  x2z  x  r*]  /J 

(3.2.6) 

e3  =  z 

(3.2.7) 

Metric  tensor 

ffn  =  ei  •  ei  =  (1  -  x2)2r|  +  ( x2)7r%  +  2x2(l  -  x2)rs  •  tN 

(3.2.8) 

92 2  =  e2  •  e2  =  (1  -  xl)2r^,  +  (x1)Ve  +  2x'(l  -  x*)r£  •  rw 

(3.2.9) 

033  =  1 

(3.2.10) 

012  —  021  =  e2  ‘  ®1 

=  (1  -x1)(l  -x2)rs-  Tw  +X1(1  -x2)rs  •  Te  +  x’x2^  •  rE  +  (1  - 

x’)x2r/v  •  rw 

(3.2.11) 

023  =  032  =  013  —  031  =  0 

(3.2.12) 

J  -  y/g  =  ei  x  e2  •  e3 

=  (1  -  x')(l  -  x2)(rs  x  rw)  ■  i 
+  i‘(l-i2)(rsxr£).z 
+  x'x2(r*  x  rE)  ■  z 
+  (1  -  x')x2(r/v  x  rw)  •  i 

(3.2.13) 

The  formulae  (3.2.2)  -  (3.2.13)  give  a  geometrical  interpretation  of  the  ba¬ 
sis  vectors  and  metric  elements  in  terms  of  interpolation  in  the  element  side 

vectors  rw,  rs,  rE  and  rw- 

A  visualisation  of  the  reciprocal  vector  sets  is  provided  by  Figure  3.2. 

There,  both  the  contravariant  and  covariant  basis  vectors  are  sketched  for 
node  1  of  the  element.  At  node  1,  the  Jacobian  takes  a  value  equal  to  the  area 

of  the  parallelogram  formed  by  rw  and  rs'. 

J  =  Joo  =  rs  x  rw  •  z 

(3.2.14) 

and  the  vectors  are 

el  =  rw  x  z/Joo,  e,  =  rs 

(3.2.15) 

e2  =  z  x  rs/Joo,  *2  =  rw 

(3.2.16) 

3.2.2  3>D:  Linear  Hexahedra 

If  we  label  the  nodes  of  the  3-D  element  by  index  triplet  (i,j,k), 
1,  and  let 

i,j,  k  =  0  or 

• 

Wo(*)  =  (I-*) 

(3.2.17) 

VF,(r)  =  z 

(3.2.18) 
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Figure  3.2:  A  sketch  of  the  basis  vectors  at  node  1  of  the  quadrilateral  element 


then  the  3-D  analogue  of  Eq. (3.2.1)  becomes 

x  =  x,]kW,(xl)W3(x')Wk(x3) 
The  contravariant  basis  vectors  become 


(3.2.19) 


ei  =  W  =  (xD*-Xoi*)V^(iJ)Wt(i3) 

=  rijkW}(x2)Wk(i>)  (3.2.20) 

Similarly 

ea-r,JAVF<(*,)^t(x3)  (3.2.21) 

«3  =  r,ijVF,(i‘)»Vi(xJ)  (3.2.22) 

The  covariant  basis  functions,  metric  tensor  elements,  etc  follow  by  substitut¬ 
ing  Eq.(3.2.19)  into  the  appropriate  formula  as  shown  in  Section  4.1. 


(3.2.21) 

(3.2.22) 


3.3  Two  Dimensional  Interpolation 

Mesh  generation  for  finite  volume  and  finite  element  schemes,  by  blending  the 
interpolants  from  intersecting  pairs  of  curves  such  that  the  mesh  exactly  fits 
the  boundary  curves,  was  introduced  by  Gordon  k.  Hall  [1]  and  is  now  widely 
used  in  body  fitting  fluid  flow  codes,  such  as  Harwell-  FLOW3D  [2,3].  We 
summarise  here  the  transfinite  interpolation  formulae  that  we  plan  to  use  in 
the  body  fitting  electromagnetic  particle-mesh  software. 


i 
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Figure  3.3:  The  four-sided  domain  is  hounded  by  two  pairs,  (Xo,Xi,)  and 
(Yo,  Yi),  of  curves  voith  intersections  at  the  four  points  x,j,  i,j  =  0  or  1. 

Figure  3.3  shows  a  four  sided  domain.  Each  of  the  boundary  curve  pairs 
is  a  function  of  a  single  variable,  denoted  r  or  s.  It  is  assumed,  without  loss 
of  generality,  that  0  <  r  <  1,  0  <  s  <  1.  The  problem  is  to  define  a  function 
x(r,  s)  such  that 


x(r,0)  =  Xo(r) 

(3.3.1) 

x(0,a)  =  Y0(s) 

(3.3.2) 

x(r,l)  =  X,(r) 

(3.3.3) 

and 

x(l,a)  =  Yj(s) 

(3.3.4) 

where  i,j  =  0  or  1. 

x(i,j)  =  Xj(t)  =  Y  i(j)  =  xi} 

(3.3.5) 

If  we  introduce  interpolating  functions  ifaj(s)  and  ^>i(s),  where  V’o(O)  = 
li  V’o(l)  =  0,  V’i(O)  =  0,  V’i(l)  =  1,  then  we  can  write  the  ‘horizontal’  curve 
interpolation 

xH(r,s)  =  Xo(r)V>o(a)  +  Xi(r)V>i(a)  (3.3.6) 

(Typically,  one  would  take  i/>  piecewise  linear,  i.e.  d’o  —  1  —  •*»  Vh  =  *)• 
Similarly,  we  can  define  the  ‘vertical’  curve  interpolation  using  interpolation 
functions  ^o(r)  and  ^i(r):- 

x„(r,a)  =  ^(r)Yo(a)  +  *,(r)Y,(a)  (3.3.7) 

The  interpolation  given  by  (3.3.6)  satisfies  (3.3.1)  and  (3.3.3),  but  not  in  gen¬ 
eral  (3.3.2)  and  (3.3.4).  Similarly,  Eq.(3.3.7)  satisfies  (3.3.2)  and  (3.3.4),  but 
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not  generally  (3.3.1)  and  (3.3.3).  Both  interpolants  (Eqs.(3.3.6)  and  (3.3.7)) 
satisfy  the  corner  constraints,  Eq.(3.3.5),  as  does  the  corner  interpolant 

xc(r,s)  =  Xoo<Mr)V>o(s)  +  Xoi<fo(r)V>i(s)  +  x,o<Mr)0o(3)  +  XnMr)'l>i(*) 

(3.3.8) 

Taking  a  linear  combination  of  Eqs.(3.3.6)-(3.3.8)  gives 

x  =  ax*  +  0xv  -f  yxc  (3.3.9) 

Evaluating  Eq. (3.3.9)  at  the  sides  r  =  0,  1,  and  s  =  0, 1,  and  equating  the  re¬ 
sults  to  Eqs.(3.3.1)-(3.3.5)  yields  the  desired  transfinite  interpolation  function 
if  or  =  fi  =  —7  =  1.  Thus  we  may  write  our  transfinite  interpolation  function 

x(r,  s)  =  Xj (r)V>j(a)  +  Yi(s)<&(r)  -  Xi}<t>,(r)ti>j(s)  (3.3.10) 

where  the  sums  over  i,j  =  0  or  1  are  implied. 

3.3.1  A  Tabular  Curve  Example 

Given  the  ‘horizontal’  tabular  curves 

Xo(/),X,(7),  1  =  0, NX  (3.3.11) 

and  ‘vertical’  tabular  curves 

Y0( J),  Y,( J),  J  =  0,NY  (3.3.12) 

where 

Xo(0)  =  Y0(0),  X,(0)  =  Yo(JVn 

Xo(NX)  =  Y,(0),  Xi(NX)  =  Yi(NY)  (3.3.13) 

and  assuming  linear  interpolation,  we  shall  obtain  the  expression  for  the  coor¬ 
dinates  of  node  (I,J). 

From  the  definitions  of  r,  s,  /,  J,  we  have 

r  =  I /NX,  3  =  J/NY  (3.3.14) 

For  linear  interpolation 

&(z)  =  =  l-z;  0  <  2  <  1  (3.3.15) 

^i(z)  =  tpi (z)  =  2;  0  <  z  <  1  (3.3.16) 

x(/,J)  =  Xo(/)(l  -s)  +  X,(/)s 
+  Y0(J)(l-r)  +  Y,(J)r 

-  Xoo(l  -  r)(l  -  s)  -  Xoi(l  -  r)s  -  x10r(l  -  s)  -  xn»(*-3.17) 
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The  transfinite  interpolation  example  given  above  can  be  readily  extended  to 
provide  global  definitions  of  basis  vectors  and  metric  tensor  elements  in  terms 
of  the  boundary  data  values.  If  we  assume  that  the  domain  enclosed  by  the 
‘horizontal’  and  ‘vertical’  curve  pairs  maps  to  the  rectangle  0  <  x1  <  NX, 
0  <  x7  <  NY  in  curved  space  (so  the  curved  space  grid  would  have  unit 
square  elements),  and  that  points  on  the  boundary  curves  are  joined  by  straight 
segments,  then 

Xotz1)  =  Xo(/)  +  (Xo(/  +  1)  -  Xo(/))(f*  -  I)  (3.3.18) 


for  /  <  z1  <  /+  1,  etc,  and  (17)  may  be  extended  to  general  position  (i1,!2):- 


x(i\x2)  =  X0(xl)  +  (Xl(xl)-Xo(xl)-x0,)x3/NY 
+  Y0(z2)  +  (Y,(i2)  -  Yo(zJ)  -  x10)x7 NX 
-  Xoo  +  (Xoi  +  Xio  -  Xoo  -  xn )xxx2/NX  NY  (3.3.19) 


from  which  values  of  contravariant  basis  vectors 


e,  = 


dx 

dx' 


and  metric  tensor  elements 
may  be  computed. 


9>j  —  ®*  ■  ej 


(3.3.20) 

(3.3.21) 


3.4  Three  Dimensional  Interpolation 

3.4.1  Interpolation  between  Specified  Curves 

The  two  dimensional  interpolation  scheme  of  the  previous  section  generalises 
straightforwardly  to  tnree  dimensions.  The  aim  is  to  divide  the  general  curvi¬ 
linear  h^xahedra  as  illus'rate  j  in  Figure  3  4  into  a  set  of  hexahedral  elements 
which  map  to  unit  cubes  in  curved  space. 

The  bounding  curve  X,;(r),  r«[0,  Ij  joins  the  corner  node  x^  at  r  =  0 
to  node  XitJ  at  r  =  1.  Similarly  Y1;(j)  joins  X&,  to  x,i;  and  Z,}(t)  joins  xtj0 
to  x,jj ,  where  i,j  =  0  or  1.  The  interpolation  functions  for  the  r,  s,  and  t 
coordinates  art  respectively  <p,  ip  and 

Blen  ’ing  interpolants  between  tL=  bounoag  curves  to  obtain  exact  fits  at 
all  eight  edges  gives  the  formula 

x(r,s,r/  =  X,*(r)^;>i->*;t'  •  Y,*(s)^,(r)ij*(t) 

+  7,,}(t  j<ijr)ip}(s)  -  >x,j*^(r)t(’>(3)»?*(0  (3.4.1) 


where  s..ms  over  i,j,  A  =  0  -nd  1  are  impl eo 


Figure  3.4:  The  six  sided  volume  is  defined  by  the  twelve  bounding  curves 
joining  the  comer  nodes  at  positions  Xy*,  i,j,  k  =  0  or  1 

3.4.2  Interpolation  between  Specified  Surfaces 

In  some  circumstances,  one  or  more  of  the  surfaces  may  be  specified,  e.g. 
by  some  analytic  formula.  Let  us  suppose  that  X,(s,  <),  s,te[0, 1]  represent 
opposite  surfaces  passing  through  the  four  points  Xc]k  for  i  =  0  and  Xij*  for 
i=l.  Similarly  Yj(r,t)  are  surfaces  passing  through  the  sets  of  points  x,0t 
and  x,u,  and  Z*(r,«)  are  surfaces  through  x,j0  and  Xyi,  respectively. 

In  the  case  where  all  eight  surfaces  are  specified,  the  relevant  interpolant 
is  [4]: 

x(r,s,t)  =  X,(s,t)<t>i(r)  +  Yj(r,t)i/>j(s)  +  Zk(r,s)ijk(t) 

-  X,k{r)iP}(s)r)k(t)  -  Y a(«)&(r)ifk(t)  ~  Zy(0&(rWi(*) 

+  Xijk<t>i{r)rl>j(s)Tik(t)  (3.4.2) 

If  one  or  more  of  the  surfaces  is  not  given  in  this  form,  but  only  in  terms  of 
its  bounding  curves,  then  we  use  the  two-dimensional  blending  interpolant  to 
give  an  expression  that  may  be  substituted  in  Eq.(3.4.2).  For  example,  we 
may  set  (for  k  =  0  and  for  k  =  1), 

Z*(r,  s)  =  Xjk(r)il>j(s)  +  Y,*(a)<fc(r)  -  Xijkd>i(r)rl>j(s)  (3.4.3) 

Re-expressing  all  of  X,-,  Y}  and  Z*  in  this  way  gives  us  back  Eq.(3.4.1). 


3.5  Multiblock  Decomposition 

The  multiblock  decomposition  divides  the  object  into  blocks,  and  these  blocks 
map  naturally  onto  processes  on  a  MIMD  computer  architecture.  Within  each 
block,  finite  element  contributions  are  assembled  on  the  uniform  lattice  in 
curved  coordinate  space,  and  particle  positions  are  stored  in  curved  space. 


( 


•  4 


J  W  Eastwood,  R  W  Hockney  and  W  Alter  RFFX(92)52 


23 


This,  as  will  be  shown  in  later  chapters,  leads  to  field  equations,  charge  as¬ 
signment  and  current  assignment  which  are  the  same  in  all  coordinate  systems, 
and  require  no  more  computational  work  than  for  the  uniform  cartesian  mesh 
case.  Geometrical  complications  are  swept  into  the  constitutive  relationships 
and  the  particle  accelerations. 

Efficient  MIMD  software  balances  work  across  processors  whilst  minimising 
the  amount  of  global  data  and  interprocessor  message  passing.  The  physics 
of  Maxwell’s  equations  provide  a  natural  ordering  «f  the  data  to  achieve  this, 
in  that  interactions  at  a  point  in  space  only  involve  information  from  its  light 
cone  in  space-time;  the  computational  domain  is  subdivided  into  blocks.  The 
blocking  is  chosen  to  simplify  implementation  of  boundary  condition  and  of 
data  visualisation,  and  to  optimise  the  utilisation  of  computer  resources.  Spa¬ 
tially  blocked  field  data  only  requires  data  from  neighbouring  blocks  in  the 
distributed  memory  MIMD  implementation. The  master  control  program  only 
requires  information  about  the  block  surfaces  ( “glue  patches” )  which  stick  to¬ 
gether  the  volume  filling  meshes  in  each  of  the  slave  block  processors.  This 
arrangement  offers  the  prospects  of  a  larger  computational  intensity  and  weak 
Amdahl  limit  on  parallel  processor  speedup.  Moreover,  the  simple  logical 
square  (cube  in  3-D)  addressing  within  each  block  leads  to  fast  serial  process¬ 
ing  within  each  slave  process. 

Without  compromising  the  subdivision  of  the  spatial  mesh  into  blocks  to 
get  efficient  MIMD  processing,  one  can  further  demand  that  boundary  con¬ 
ditions  only  apply  at  the  surfaces  of  block.  This  completely  eliminates  the 
addressing  problems  in  embedding  surfaces  within  blocks,  and  allows  surface 
data  to  be  passed  to  the  control  program  through  the  “glue  patch”  tables. 
Further  saving  of  computer  storage  and  time  arise  from  only  keeping  metric 
information  and  material  property  data  in  those  blocks  where  they  are  needed. 
When  a  large  number  of  small  blocks  are  needed  to  describe  a  complex  object, 
load  balancing  is  achieved  by  assigning  several  blocks  to  one  processor. 

In  summary  the  mesh  (or  more  correctly,  the  finite  element  net)  will  use: 

•  A  multiblock  spatial  decomposition  where  segments  of  target  surfaces 
and  other  boundaries  are  coincident  with  block  surfaces. 

•  Indirect  (“glue  patch”)  addressing  between  blocks,  and  logical  space  (i,j) 
(or  cube  (i,j,k)  in  3-D)  element  nets  within  blocks. 

•  Transfinite  interpolation  subdivision  of  the  curvilinear  quadrilateral  (hex- 
ahedral  in  3-D)  multiblocks  into  finite  elements. 

Figure  3.5  illustrates  the  multiblock  decomposition  of  a  complex  object 
into  a  set  of  curvilinear  sub-block  connected  by  glue  patches.  The  cylinder  (a) 
is  divided  into  five  block  (b).  These  blocks  are  treated  as  separate  objects  with 
their  own  coordinate  systems,  and  communicate  by  passing  field  and  particle 
data  via  the  glue  patches  (denoted  by  dashed  lines  in  (c)).  (d)  the  curvilinear 
quadrilateral  (hexahedral  in  3-D)  blocks  are  meshed  by  transforming  them  to 
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Figure  3.5:  An  illustration  of  the  multiblock  decomposition  and  meshing  of  a 
complex  object. 

rectangles  (bricks  in  3-D)  in  curved  space  and  dividing  the  rectangles  (bricks) 
into  unii  side  square  (cube)  elements.  Uniblock  particle  and  field  primitives 
will  be  defined  for  the  sub-blocks. 

Metric  and  basis  vectors  are  defined  by  the  transformation  between  the 
physical  and  curved  space  meshes.  The  simplicity  of  the  computation  of  metric 
elements  from  boundary  data  in  transfinite  interpolation  offers  the  option  of 
recomputing  values  of  elements  as  required  from  the  boundary  curve  set:-  for 
an  N  xNxN  mesh,  it  reduces  necessary  storage  to  the  12jV  boundary  vectors! 
Another  alternative,  and  one  we  propose  adopting,  is  to  store  global  reference 
points  for  each  uniblock,  plus  the  covariant  basis  vectors  and  metric  tensor 
elements  for  each  block  type.  In  most  cases,  symmetries  and  congruences 
greatly  reduce  the  amount  of  geometric  data  to  be  stored.  For  instance,  the 
example  shown  in  Figure  3.5  extended  to  3-D  perpendicular  to  the  plane  shown 
has  two  block  types  only;  the  central  rectangular  brick  block  type,  which 
requires  only  six  numbers  to  completely  specify  its  basis  vectors  and  metric 
tensor  elements,  and  the  surrounding  wedge  shaped  block  type,  which  requires 
a  single  plane  of  values  each  for  the  basis  vector  and  metrics.  Specifying  the 
four  reference  points  (0,0,0),  (1,0,0),  (0,1,0)  and  (0,0,1)  as  shown  in  Figure 
3.4  allows  similar  blocks  to  be  fitted  to  block  types  by  translation,  rotation, 
reflection  and  linear  scaling. 


Chapter  4 

Field  Equations 


In  this  Chapter,  a  number  of  alternative  Virtual  Particle  schemes 
are  derived  for  the  solution  of  the  electromagnetic  field  equations. 
Their  relative  merits  are  discussed  and  the  particular  variant  which 
will  form  the  basis  of  the  MIMD  software  is  specified. 


4.1  Introduction 

When  a  virtual  particle  electromagnetic  particle-mesh  scheme  is  derived,  it 
implies  a  specific  method  for  solving  Maxwell’s  Equations.  Such  a  method 
is  optimal  in  the  sense  that  it  minimises  the  approximated  action,  and  will 
inherit  the  many  desirable  properties  of  virtual  particle  particle-mesh  schemes 
where  applicable,  e.g.  being  well  suited  to  parallel  computer  implementation. 

In  this  chapter,  following  references  [6,  9],  virtual  particle  electromagnetic 
particle-mesh  schemes  for  general  three-dimensional  co-ordinate  systems  are 
outlined.  Effectively  the  schemes  differ  only  in  the  forms  taken  by  the  consti¬ 
tutive  relations.  Nevertheless,  a  meaningful  comparison  of  their  relative  merits 
is  possible  and  is  set  out.  The  chosen  algorithm  is  summarised  at  the  end  of 
the  chapter;  We  propose  that  the  simplest  lumped  scheme  form  the  basis  of 
the  software  in  the  first  instance. 


4.2  The  Variational  Formulation 

The  electromagnetic  field  action  integral  may  be  written  in  general  curvilinear 
coordinates  (x1,  xJ,  x3)  as 

I  =  Jdt  dx 1  dx1  dx3y/g  D'  -  Hi  B')  +  j'  A,  -  ,  (4.2. 1 ) 

where  electric  field 

=  (4-2.2) 
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magnetic  field 


electric  displacement 


magnetic  intensity 


_  e*dA> 
y/gdU' 


D'  =  t0E'  =  e0g'jEj, 


Hi  =  -  = 

tic  Ho 


current  density 


j'  =  H  -  *lM**  ~  > 

p  \9 


and  charge  density 


/»  =  5Z  -  *J)*(**  ~  -  *,)• 


(4.2.3) 


(4.2.4) 


(4.2.5) 


(4.2.6) 


(4.2.7) 


The  sum  over  p  is  over  particles,  each  with  charge  qp.  The  metric  tensor 
elements  g,}  can  be  computed  from  the  relationship  between  the  reference 
cartesian  coordinates  x*  and  the  general  curvilinear  coordinates  x’,  and  then 
9  =11  9H  II 

Discrete  equations  are  derived  by  introducing  finite  element  approxima¬ 
tions  to  the  potentials 

(4.2.8) 

A,  =  A(j)M/(<)  (4.2.9) 

where  ♦  and  A(,j  are  nodal  amplitudes.  Equation  (4.2.8)  is  shorthand  for 


4>(xl,  x2,  x3,  t)  =  53*(k,  n)[/(k,  n ;  x\  x2,  x3,  t) 


(4.2.10) 


where  the  sum  is  over  spatial  (k)  and  temporal  (n)  node  indices.  Eq.(  4.2.1)  is 
then  varied  with  respect  to  the  nodal  amplitudes  yielding  discrete  approxima¬ 
tions  to  the  inhomogeneous  Maxwell’s  equations.  (The  homogeneous  Maxwell 
equations  follow  by  virtue  of  Eqs.(4.2.2)  and  (4.2.3),  if  U  and  W,  are  appro¬ 
priately  chosen). 

4.2.1  The  Point  wise  Scheme 

The  simplest  general  geometry  extension  of  the  derivation  given  in  reference 
[6]  is  given  by  supposing  that  the  relations 


=  toy/gg^Ej, 


(4.2.11) 
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hold  in  a  pointwise  sense.  Introducing  the  basis  functions  V*  for  Ei,  and  X, 
for  b'  =  y/gB'  implied  by  substituting  Eqs. (4.2.8),  (4.2.9)  in  Eqs.(4.2.2)  and 
(4.2.3)  gives  nodal  equations 

d, d‘  -  p  =  0,  (4.2.13) 

dtd'  -  eijkdjHk  +  I*  =  0.  (4.2.14) 

where  d}  is  the  centred  finite  difference  operator  in  the  x}  coordinate  direction, 
and  dt  is  the  centered  time  difference  i.e.  if  the  nodal  interval  is  unity,  then 

dtf(t)  =  /(<  +  1/2)  —  f(t  -  1/2),  etc. 

The  homogeneous  Maxwell’s  equations  become 


=  0, 

a(b’  +  tiikd}  E*  =  0, 

resulting  in  a  need  to  satisfy  constitutive  relations 


d’  =  eo  {y  dtyfgdx'  dx 2  dx3  5(,)JV^(1)VS}  E, 

H  k  =  ±-{jdt  dx'  dx2  dx3^X(k)X,  J  b'. 


(4.2.15) 

(4.2.16) 


(4.2.17) 


(4.2.18) 


A  disadvantage  of  this  approach  is  that  in  view  of  Eqs. (4. 2.14)  and  (4.2.16), 
b*  and  d*  are  natural  variables  to  update,  but  Eq. (4.2.17)  is  not  explicit  for  E, 
in  terms  of  d*.  To  understand  the  seriousness  of  the  problem,  it  is  sufficient 
to  consider  the  two-dimensional  case,  with  g,}  constant.  The  lowest  order 
conforming  basis  functions  compatible  with  charge  conservation  yield  from 
Eq.(4.2.17),  assuming  unit  mesh-spacing,  that 

d1  =  eo  {. 9 “  [\  ([^]2  +  [dt] *)  |/]  Er  +  f  [5  +  «)]  E„ }  ,  (4.2.19) 

and  similarly  for  the  y  and  z  components.  d±  denotes  a  shift  of  plus  or  minus 
half  a  mesh-spacing  in  the  atk  co-ordinate.  The  g**  term  is  not  susceptible  to 
any  but  the  grossest  approximation.  Taking  Eq.(4.2.19)  as  it  stands  leads  a 
coupled  system  of  equations  involving  all  nodal  values  of  Ez  and  Ey.  Hence 
other  approaches  are  investigated  in  the  following  sections. 

4.2.2  Optimal  Schemes 

These  schemes  are  derived  by  replacing  the  pointwise  relations  Eqs.(4.2.11) 
and  (4.2.12)  with  weak  approximations  that  are  optimal  in  a  least  squares’ 
sense.  The  replacements  do  not  change  the  current  and  charge  density  terms, 
hence  attention  focuses  on 

It=l-  fdtdx'  dx2  dx3(^=—-—)  (4.2.20) 

2  J  \y/g  t0  po)  v  ' 


•  < 


•  i 
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It  is  convenient  to  introduce  here  the  inner  product  notation 

(a,  b)  =  [  it  dx 1  dx 2  dx3a(x\t)b(x\t),  (4.2.22) 

JD 

where  D  is  the  domain  of  interest. 

To  treat  Sly,  Eqs. (4.2.11)  and  (4.2.12)  are  rearranged  in  the  forms 

£.  =  (4.2.23) 

H.  =  (4.2.24) 

Introduce  the  representation  d*  =  then  the  optimal  approximation  to 

Eq.(4.2.23)  yields  E'  =  E^V^)  where 

(E,V{I),  V{t))  =  (J—9ijd%h  V(t)>.  (4.2.25) 

Similarly  Eq.(4.2.24)  gives 

(HiX(i),X(i))  =  9ijVXljhX(i)).  (4.2.26) 

Varying  (4.2.20)  yields 

6IX  =  (  dt  dx 1  dx2  dx3-^  (<?*<*%)  V0)  -  b'6b>X(l)X{j))  .  (4.2.27) 

J  y/9  V 

Hence  on  varying  Eq. (4.2.25),  and  inserting  it  and  Eq. (4.2.26)  into  Eq.(4.2.27); 

6JX  =  J  dt  dx 1  dx2  dx3  (tT6EiV{i)V(i)  -  Hi6b'X{i)X{i))  .  (4.2.28) 

and  6b' X^)  can  be  straightforwardly  calculated  for  variations  of  ♦  and 
A ,  yielding 

^  =  J  dt  dx1  dx2  dx3  -  <?t/J  =  0  (4.2.29) 
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=  -dtVk, 


(4.2.32) 


■37-r  =  —djXk\  ijk  cyclic  permutations  of  (123), 
ox3 


(4.2.33) 


and  the  lumping  approximations  (V},),  V{,-))  =  1,  (X^),  X^))  =  1  then  bring 
Eqs.(4.2.29)  and  (4.2.30)  to  the  form  of  Eqs.(4.2.13-4.2.14).  Note  that  the 
left-hand  sides  of  Eqs.(4.2.25)  and  (4.2.26)  have  to  be  lumped,  otherwise  even 
tor  the  lowest  order  conforming  elements,  these  are  not  explicit  formulae  for 
Ei  and  H,.  A  relatively  straightforward  refinement  to  the  lumped  scheme  is 
to  use  simple  iterative  schemes  to  improve  the  approximation.  The  diagonal 
dominance  of  the  mass  matrices  should  lead  to  rapid  convergence. 

6Ij  can  also  be  brought  to  the  form  (Eqs. (4. 2. 13-4. 2. 14)  if  Eqs.  Equations 
(4.2.11)  and  (4.2.12)  are  put  into  the  weak  forms 


(4.2.34) 

Ho 

(4.2.35) 

b‘  are  introduced  so  that 

=  {y/gD' ,  V{i)), 

(4.2.36) 

=  (VaB\  x(i)). 

(4.2.37) 

Thus  d'  and  b'  are  updated,  then  the  coefficients  D‘  and  B '(/?’  =  D(,)  V(,),  B'  = 
B(^A"(i))  follow  explicitly  from  Eqs.(4.2.36)  and  (4.2.37)  if  the  lumping  approx¬ 
imation  is  made,  and  finally  Eqs.(4.2.34)  and  (4.2.35)  yield  E,  and  H,. 

4.2.3  Hybrid  Schemes 

It  is  possible  to  derive  explicit  schemes  by  keeping  the  point  constitutive 
relation  Eq.(4.2.12)  for  the  magnetic  field  and  the  optimal  approximation 
Eq.(4.2.25)  or  Eq.(4.2.34)  for  the  electric  field.  We  shall  not  pursue  this  option 
further  here. 


4.3  Properties  of  the  Schemes 

In  this  section  the  lowest  order  conforming,  charge  conserving  elements  are 
considered.  It  will  also  be  assumed  that  gij  and  y/g  are  constant  everywhere, 
corresponding  to  the  case  where  physical  space  is  split  into  congruent  paral¬ 
lelepipeds.  Under  these  circumstances  it  is  convenient  to  introduce  the  oper¬ 
ators 

M°  =  £(fc]2+[t£]2)  +  f/,  (4.3.1) 
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Aa=l-fa+&L),  (4.3.2) 

where  d±  were  defined  following  Eq.(4.2.19).  The  inner  products  of  basis 
functions  may  be  written  simply  as 


MkMe  (i  =j  =  a) 
AaAkMc  ( i  =  a,j  =  b )  ’ 

MaM>  (i=j  =  a) 
AaAkMt  (i  =  a, j  =  b)  ' 


(4.3.3) 


(4.3.4) 


where  a,  b  and  c  are  a  permutation  of  the  spatial  indices  (123)  and  t  denotes 
time.  Henceforth  we  lump  in  time,  i.e.  set  A/‘  =  1. 

For  the  pointwise  scheme,  the  constitutive  relations  take  the  forms 

d1  =  Uty/9  (g"M2M3 E,  +  gx2AlA2M3E2  +  gl3AxM2A3E3)  (4.3.5) 


Hi  —  —  (jfn  A/1  b1  +  g\2Ax  A2b2  +  gi3/lM3b3) 


(4.3.6) 


Without  lumping,  the  optimal  equations  for  I\  are  in  variables  <f,  b'  where 


dx  -r  M2M33X,  bx  =  Mxb 1 


(4.3.7) 


and  the  constitutive  relations  are  e.g. 

M2M3Et  =  — (guM2M3dx  +  gl2AxA2M3^  Ag\3AxA3M2di)  (4.3.8) 


Mx H\  —  (g"M' '  +  yi2^1-42i2  +  gwAx  A3b3^  .  (4.3.9) 

For  I2  we  find  that  E,  is  related  to  d'  in  the  same  way  that  E,  depends  on  <f 
and  similarly  for  the  magnetic  field,  i.e.  when  the  gi}  are  global  constants,  the 
optimal  1 1  and  12  schemes  are  identical. 

Comparing  Eqs.(4.3.6)  and  (4.3.9)  shows  that  the  pointwise  relation  for 
Hi  is  equivalent  to  the  optimal  one  after  lumping  in  space,  i.e.  setting  AT  = 
1.  The  effect  of  lumping  is  different,  in  Eq.(4.3.6)  it  actually  sharpens  H,, 
whereas  in  Eq.(4.3.9)  the  off-diagonal  terms  are  smoothed.  In  the  absence 
of  spatial  lumping  the  pointwise  scheme  should  produce  a  smoother  H,  and 
might  be  preferred.  However  the  implicitness  of  Eq. (4.3.5)  means  that  only 
a  hybrid  scheme  would  be  praciicab'0,  and  if  the  M'  =  1  such  a  scheme  is 
indistinguishable  from  the  fully  optimal  ones.  The  difference  between  the  / j 
and  12  schemes  is  slight;  the  absence  of  a  division  by  y/g  may  be  an  advantage 
for  I2  when  the  elements  are  very  distorted.  However  if  such  situations  are 
avoided,  I\  is  preferable  because  its  operation  count  is  likely  to  be  smaller. 
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4.4  Maxwell’s  Equations 

The  principal  difference  between  the  cartesian  and  general  curvilinear  deriva¬ 
tions  of  the  field  equations  from  the  action  integral  (4.2.1)  is  the  need  for 
approximations  to  evaluate  Eqs.(4. 2.4), (4.2.5)  and  metric  tensor  elements  in 
the  latter  case.  Evaluation  of  the  metric  elements  was  dealt  with  in  Section 
3.2.  Choices  of  approximations  to  Eqs.(4.2.4)  and  (4.2.5)  lead  to  implicitness 
in  computing  either  E,  from  dJ  for  the  circulation  term  in  Faraday’s  Law  or 
d}  from  E,  in  applying  boundary  conditions;  the  latter  is  favoured  for  compu¬ 
tational  simplicity,  particularly  since  the  boundary  condition  is  implicit  only 
for  non-orthogonal  element  nets  at  external  boundaries.  The  scheme  we  shall 
implement  will  be  the  It  optimal  scheme,  with  (at  least  in  the  first  instance) 
lumped  ‘mass’  matrices. 

All  of  the  Virtual  Particle  schemes  discussed  in  Section  4.2  yield  the  ap¬ 
proximations  to  Maxwell’s  Equations: 

d,b'  =  -e,jkd}Ek,  d,b'  =  0  (4.4.10) 

a,d'  =  eijkdjHk  -  /’,  d,d'  =  Q  (4.4.11) 

The  precise  relationship  between  quantities  b',d’,  Et,Ht  and  the  nodal  ampli¬ 
tudes  depends  on  the  approximations  chosen  for  the  constitutive  relationships. 
In  the  lowest  order  lumped  approximation  to  the  7j  scheme  they  become  the 
nodal  amplitudes. 

The  analysis  leading  to  Eqs.(4.4.10)  and  (4.4.11)  shows  that  the  discrete 
equations  in  the  quantities  b‘,d‘,  Ek,H*,/‘  and  Q  are  identical  in  any  coordinate 
system.  Geometrical  information  appears,  along  with  the  permeability  and 
permittivity  tensors,  only  in  the  constitutive  relations  relating  d'  to  E,  and  b' 
to  H;. 

Figure  4.1  shows  the  location  of  the  nodes  that  arises  from  the  choice  of 
linear  potential  representation  in  three  dimensions.  Note  that  the  forms  of  the 
’difference’  equations  Eqs.(4.4.10)  and  (4.4.11)  are  identical  to  those  given  by 
the  Yee  algorithm  [13]  on  a  uniform  cartesian  net.  They  differ  from  the  Yee 
algorithm  in  the  variables  appearing  in  the  equations,  and  the  element  net  is 
not  necessarily  rectangular  in  physical  space. 

4.4.1  2-D  General  Geometry  Example 

Field  equations  for  the  virtual  particle  scheme  for  quadrilateral  elements  in 
2-D  are  discussed  in  this  section.  As  for  the  cartesian  case  considered  in  [6], 
we  shall  focus  on  the  case  of  linear  support.  We  assume  that  the  general 
coordinate  x3  is  ignorable,  i.e.  potential  and  field  derivatives  with  respect  to 
x3  are  zero. 

All  quadrilateral  elements  are  assumed  (without  loss  of  generality)  to  map 
onto  unit  square  elements  in  the  curved  (xl,x2)  coordinate  space.  If  we  assume 
mixed  linear/constant  support  over  the  elements  in  this  curved  space,  as  was 
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Figure  4.1:  The  location  of  nodes  on  the  unit  cube  element.  Crosses  give  E,, 
/'  and  <T  node  locations,  open  circles  give  Hi  and  b'  locations  and  solid  circles 
give  position  and  scalar  potential  node  locations. 


i 


Figure  4.2:  The  location  of  scalar  potential,  <t>,  vector  potential  (A\,  Aj,  A3j 
and  magnetic  field  fb1,  bs,  b3^  nodal  amplitudes.  Fields  H*  have  same  spatial 
locations  as  b*  and  E,,  d'  and  I1  have  same  spatial  locations  as  A i 
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used  for  the  2-D  cartesian  example  in  [6]),  then  the  location  of  nodal  values  on 
a  single  elements  will  be  as  illustrated  in  Fig  4.2.  These  nodes  are  interlaced 
in  time.  Located  at  half-integral  timelevels  are  values  of  A,,  b‘,  H,,  I*  and  at 
integral  timelevels,  nodal  values  of  E,,  d',  <fc  and  q  are  defined. 

It  follows  from  the  results  of  Reference  [6}  that  the  assembled  finite  element 
node  equations  can  be  written  in  operator  form: 

Initial  conditions:- 


did1  +  dzd2  =  p 

(4.4.12) 

dib1  +  djb2  =  0 

(4.4.13) 

Transverse  Magnetic  (TM)  equations 

d,d'  =  djHs  -  I1 

(4.4.14) 

dtd2  =  -d,H3  -  I2 

(4.4.15) 

Stb3  =  d2E,  -d,E2 

(4.4.16) 

Transverse  Electric  (TE)  equations 

^b1  =  -^Ea 

(4.4.17) 

a(b2  =  d^ 

(4.4.18) 

dtd3  =  d,H2  -  djH,  -  I3 

(4.4.19) 

4.5  Constitutive  Equations 

The  weak  approximations  Eqs.(4.2.25)  and  (4.2.26)  to  Eqs.(4.2.4)  and  (4.2.5), 
with  lumped  mass  matrices  give  the  simplest  explicit  expressions  for  E,  and 
H,.  Their  evaluation  gives  tensor  equations 

E,  =  Gf}d3  (4.5.1) 

H,  =  (4.5.2) 

where  elements  of  the  symmetric  tensors  G and  G are  sparse  matrices. 

The  lowest  order  lumped  approximations  lead  to  matrices  G„  which  are 
diagonal,  and  matrices  G,;  with  four  nonzero  elements  per  row.  Figure  4.3 
illustrates  this  for 

E2  =  Gfjd1  +  Gf2d2  (4.5.3) 

Ej  is  computed  at  the  central  node  in  Figure  4.3  from  d2  and  G22  at  that 
node,  and  from  d1  at  the  four  neighbouring  nodes  as  indicated  by  arrows. 
In  general,  different  values  of  Gf,  are  associated  with  each  arrow.  In  three 
dimensions,  Gf3d3  gives  similar  contributions  from  the  four  neighbouring  nodes 
at  which  d3  are  stored. 
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Figure  4.3:  Computation  of  Ej  at  the  central  node  using  the  lowest  order  lumped 
approximation  to  Eq.(f.5.1)  involves  values  of  d2  at  that  node,  and  from  d*  at 
the  four  neighbouring  nodes  as  indicated  by  arrows. 

4.6  Algorithm  Summary 

Four  different  schemes  for  solving  Maxwell’s  equations  have  been  derived,  and 
their  relative  merits  examined.  All  have  the  desirable  attributes  of  being  charge 
conserving,  having  good  dispersive  properties  and  being  well  suited  to  parallel 
computer  implementation.  For  the  situation  where  lumping  is  a  good  qual¬ 
ity/cost  compromise  and  the  finite  elements  are  not  too  distorted,  the  optimal 
1 1  scheme  with  lumping  appears  most  suitable;  it  is  this  scheme  that  will  form 
the  basis  of  the  benchmark  program. 

Local  block  coordinates.  The  multiple  decomposition  divides  the  com¬ 
putational  space  into  a  set  of  curvilinear  quadrilateral  (hexahedral)  blocks. 
Each  of  these  blocks  has  associated  with  it  a  local  cartesian  coordinate  sys¬ 
tem,  which  is  related  to  the  global  cartesian  coordinates  by  some  translations 
and/or  rotations.  For  the  purposes  of  integrating  Maxwell’s  equations  for  ten¬ 
sor  components  forward  in  time,  only  the  metric  tensor  is  required,  but  for 
extracting  physical  components  of  the  field,  and  for  the  particle  integrations, 
knowledge  of  the  transformation  between  cartesian  and  curved  space  coordi¬ 
nates  is  required. 

Fixed  timestep.  All  curved  space  finite  elements  are  chosen  to  be  cubes 
with  unit  sides.  For  the  present  discussion,  we  shall  further  assume  that  a 
fixed  timestep  of  unity  in  the  curved  space  is  used,  although  this  assumption 
can  be  straightforwardly  relaxed. 

Dimensionless  units.  For  fixed  timestep,  it  is  convenient  to  introduce 
dimensionless  units  for  physical  space  quantities.  These  imply  the  scaling  fac¬ 
tors  summarized  in  Table  4.1  to  convert  the  quantities  appearing  in  Eqs.(4.6.1 
)  -  (4.6.8)  into  SI  units.  In  the  table,  po,  to  and  c  have  their  usual  meanings, 
At  is  the  timestep,  Eq  is  some  reference  electric  field  (to  be  set  to  simplify  the 
particle  equations)  and  Zq  —  po c  =  377Q  is  the  impedance  of  free  space. 

Coordinate  transformations.  Given  the  basis  vectors  e,  and  e’  and 
the  local  block  cartesian  with  unit  vectors  are  x,,  the  electric  field  E  may  be 
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Table  4.1:  Scaling  factors  to  convert  quantities  to  SI  units 


Quantity 

Scale  factor 

electric  field 

Eo 

magnetic  field 

E0/c 

length 

cAt 

time 

At 

velocity 

c 

basis  vector  (et) 

cAt 

reciprocal  basis  vector  (e‘) 

1/cAt 

metric  tensor  elements  (g,}) 

(cAty 

A„$,E„b' 

E0cAf 

<r,i\Q 

EqcAI*  /  Z0 

H, 

EocAt  /  Zq 

permittivity  (e) 

^0 

permeability  (p) 

^0 

written 

E  =  EjXj  =  E,e'  (4.6.1) 

where  E,  are  the  local  cartesian  components  of  the  physical  electric  field,  and 
Ej  are  the  covariant  components. 

Dotting  Eq.(4.6.1)  with  e}  gives 

E,  =  (e,x})E}  (4.6.2) 

which  may  be  written  in  matrix  notation  E  =  TE,  where  the  3x3  matrix  T  has 
elements  7),  =  e,  xr  Similarly,  the  physical  magnetic  intensity  H  is  related 
to  the  covariant  components  //,  by 

Hi  =  TijHi  (4.6.3) 

The  evaluation  of  e,  and  thence  TtJ  was  treated  in  Section  3.2. 

4.6.1  The  Timestep  Loop 

The  timestep  loop  for  the  integration  of  the  field  equations  deals  exclusively 
with  tensor  field  components.  The  input  from  the  particle  integration  are  the 
contravariant  currents  I',  and  the  output  to  the  particle  integration  are  field 
components  E,  and  b'.  Details  of  the  particle  integration  and  boundary  con¬ 
dition  will  be  discussed  later.  The  finite  element  field  equations  are  assembled 
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within  each  block,  and  glue  patch  data  exchange  is  used  to  complete  the  as¬ 
sembly  at  block  surfaces.  This  leads  to  the  following  steps  in  the  timestep 
loop: 

1 .  compute  current  from  particle  move 

2.  obtain  H  from  b  in  each  block 

(XmW  -  f~V))  =  0  (4.6.4) 

3.  compute  displacement  currents  contributions  in  each  block 

<f  =  -  rW(i))  (4.6.5) 

4.  assemble  d '  contributions  at  block  surfaces  by  gluepatch  data  exchange. 
This  step  applies  the  ‘internal’  boundary  conditions:  e.g.  periodic,  sym¬ 
metry,  neumann  and  domain  decomposition  boundary  conditions 

5.  update  <f  and  apply  surface  boundary  conditions  (conductors,  resistive 
walls,  external  circuit  couplings,  etc)  to  d*.  At  internal  boundaries,  the 
new  d*  is  computed  from 

(-d<,)^i>=d'  (4.6.6) 

6.  obtain  E  from  d  in  each  block 

(Vi(E,  -  SiLli))  =  0  (4.6.7) 

eV0 

7.  assemble  E,  contributions  across  internal  boundaries  by  glue  patch  data 
exchange 

8.  apply  surface  boundary  conditions  to  Ei 

9.  advance  b'  in  each  block 

^  =  -eilkd,Ek  (4.6.8) 

10.  compute  particle  accelerations 

In  the  above  equations,  lowest  order  lumping  is  used  to  make  the  field 
equations  and  constitutive  relations  explicit.  Otherwise,  iteration  or  direct 
solution  will  lead  to  further  message  passing  between  blocks  as  <f,  Ei ,  etc  are 
refined.  Items  (I),  (4)  and  (10)  are  dealt  with  in  more  details  in  the  next  two 
chapters. 


Chapter  5 

Particle  Equations 


This  chapter  presents  the  Virtual  Particle  charge  and  current  as¬ 
signment  schemes  in  general  curvilinear  coordinates  for  the  linear 
basis  function  finite  elements  discussed  in  the  previous  chapter  for 
Maxwell’s  equations.  Assignment  is  the  same  in  all  coordinate  sys¬ 
tems  and  has  the  charge  conserving  property.  Treatment  of  the 
equations  of  motion  using  curved  space  coordinates  for  positions 
and  cartesians  for  momenta  are  outlined. 


5.1  Variational  Formulation 

In  general  curvilinear  coordinates  (x1,  x2,  x3)  the  action  integral  may  be 
written 

I  =  Jdt  dx 1  dx 2  dx’y/gfyE,  D'  -  H,  B')  +  j'  A,  -  p<*}  +  1K  (5.1.1) 

where  Ik  is  the  kinetic  lagrangian.  If  we  further  assume  that  the  distribution 
function  is  represented  by  a  set  of  sample  points  (ie  ‘superparticles’),  then  the 
source  terms  in  the  field  lagrangian  become 


J’  =  E  -%«(**  -  ~  *p)* 

p 

p  =  E  -  *p) 


and  the  kinetic  lagrangian  term  becomes 


(5.1.2) 


(5.1.3) 


(5.1.4) 


The  sums  in  p  are  over  particles,  each  with  charge  qp. 

Treating  /  as  a  functional  of  the  vector  potential  A„  the  scalar  potential 
4>  and  particle  coordinates  {xp}  led  to  Euler-Lagrange  equations  representing 
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Maxwell’s  equations;  as  shown  in  the  previous  chapter,  the  discrete  approxima¬ 
tions  to  these  differential  equations  Me  obtained  by  substituting  test  function 
approximations  for  4>,  A,  and  xp  and  taking  variations  with  respect  to  the 
nodal  amplitudes. 

The  use  of  sample  points  (‘superparticles’)  reduces  the  velocity  space  in¬ 
tegrals  to  sums  over  particles,  and  transforms  the  Vlasov  equation  to  the  rel¬ 
ativistic  equations  of  motion  for  the  superparticles.  Source  terms,  Eqs.(5.1.2) 
and  (5.1.3),  arise  from  variations  with  respect  to  potentials  in  the  differential 
limit.  These  variations  give  the  charge  and  current  assignment  schemes  in 
the  finite  element  case,  as  illustrated  in  Sections  5.2  and  5.3.  Section  5.2.1 
shows  that  the  assignment  has  the  charge  conserving  property  even  for  non- 
orthogonal  element  nets.  In  the  differential  limit,  variation  of  the  action  with 
respect  to  particle  coordinates  gives  the  relativistic  momentum  equation. 

The  treatment  of  the  equations  of  motion  is  outlined  in  Section  5.4  In  prin¬ 
ciple,  the  equations  of  motion  can  be  obtained  from  Eq.(5.1.1)  by  considering 
variations  with  respect  to  particle  positions.  In  practice,  this  has  proved  too 
arduous  except  in  the  limit  of  infinitessimal  timestep,  and  so  recourse  has  been 
taken  to  the  conventional  finite  difference  methods  to  discretise  the  equations 
of  motion  in  time. 


5.2  2-D  Assignment 

The  finite  element  test  function  approximations  to  the  potentials  may  be  writ¬ 
ten  <f>  =  <PU  and  A,  =  A(i)W(,-),  where  0  and  A  are  nodal  amplitudes,  and 
sums  are  implied  over  element  nodes  (cf  Ref  [6]). 

Variation  of  Eq.(5.1.1)  with  respect  to  #  yield  charge  assignment: 

Q=  ( «ft£*tf((*i,  x*,  x3p)(t),t)  (5.2.5) 

J  v 

Similarly,  variation  with  respect  to  A  gives  current  assignment: 

I*  =  /*  £  0  (5.2.6) 

J  v 

The  integrals  for  Q  and  I*  are  evaluated  in  exactly  the  same  manner  as 
described  in  Ref  (6).  For  example,  if  in  a  general  quadrilateral  element  U  and 
Wt  have  the  same  linear  /  piecewise  dependence  as  their  cartesian  counterpart 
[6,  Section  4.3],  then  the  fractions  of  the  charge  assigned  from  a  charge  qp  at 
position  (i‘,fJ)  (marked  by  open  circles)  to  the  four  nodes  at  the  corners  of 
the  element  as  shown  in  Figure  5.1(a)  are  given  by 


SQnw  =  qp(l  -  xl)x2  &Qne  — 

SQse  =  9Px*(l  -  x 2)  =  qp(  1  -  x')(l  -  x2) 


(5.2.7) 
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Figure  5.1:  (a)  Charge  assignment  from  a  single  particle  and  (b)  current  as¬ 
signment  from  a  single  particle  trajectory  segment  for  linear  potential  test  func¬ 
tions.  The  figures  at  the  left  and  right  are  respectively  physical  and  curved  space 
representations. 


Similarly,  evaluating  Eq.(5.2.6)  for  the  transverse  magnetic  (TM)  contravariant 
current  components  from  a  trajectory  segment  from  (x*,x,2)  to  (x},  x})  yields 

6\N  =  qpX2Ai'  6\s  =  qp(  l-X')Ax'  (5.2.8) 

£|e  =  qpXlAx7  =  9P(  1  —  X')Ai2 

where  Xk  =  (x}  +  x*)/2,  Ax*  =  x$-x*,  where  k=l  or  2.  Coordinates  (x*,xf) 
and  (x$,x*)  are  respectively  the  start  and  end  the  trajectory  segment  (solid 
line  in  Figure  5.1(b))  in  the  finite  element.  Trajectory  segments  are  assumed  to 
be  straight  lines  in  the  curved  ( xl,x 7)  space.  Figure  5.1(b)  shows  the  position 
of  the  virtual  particle  at  ( Xk,Xk )  as  a  small  open  circle  on  the  trajectory 
segment  and  labels  the  location  of  the  current  nodes  referred  to  in  Eq.(5.2.8). 

The  £3  direction  current  components  leads  to  terms  quadratic  in  the  path 
parameter,  and  will  so  lead  to  two  virtual  particles  being  required  to  give 
the  trajectory  segment  contributions  to  the  '-mrents  at  the  four  corner  nodes. 
Figure  5.2  shows  the  virtual  particle  locations  for  the  TM  and  TE  current 
components  from  the  same  trajectory  segment  in  curved  space.  The  I3  current 
require  contributions  from  two  virtual  particles,  at  positions 

(X1,XI)±(Ax1,  AxJ)/2^3, 

both  with  strengths  Ax3/2.  Summing  the  contributions  of  the  two  virtual 
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Figure  5.2:  Current  assignment  a)  for  TM  current  components  and  b)  for  the 
TE  (=  l3^  current  component.  The  open  circles  show  the  locations  of  the  virtual 
particles  on  the  trajectory  segment  (solid  line)  lying  in  the  current  element. 


particles  gives 


^Ijviv 

^Ise 


Ax3 

Ax3 

Ax3 

Ax3 


(1  -*')(!  -X2)  + 


Ax1  Ax2 


12 


(1  -  Xl)X 2 


Ax'Ax2 


12 


XlX 2  + 


Ax’Ax2 


12 


X’(l  -  .Y2) 


Ax’Ax2 


12 


(5.2.9) 

(5.2.10) 

(5.2.11) 

(5.2.12) 


5.2.1  Charge  Conservation 

Charge  and  current  assignment  are  linear  operations,  so  by  linear  superposi¬ 
tion,  conservation  for  one  trajectory  moving  through  a  single  time  step  im¬ 
plies  the  same  for  the  sum  of  all  trajectories.  Summing  all  contributions  to 
Eqs. (5.2.8)  from  a  single  particle,  gives  the  same  as  the  difference  of  Eqs.(5.2.7) 
at  start  and  end  points,  ie  the  linear  quadrilateral  (hexahedral  in  three  dimen¬ 
sions)  element  case  satisfies 

dtQ  =  -dk\k  (5.2.13) 


where  the  symbol  d  denotes  a  centred  difference  arising  from  assembling  the 
finite  element  contributions,  and  Q  and  I*  are  nodal  amplitudes.  Equations  of 
the  form  (5.2.13)  can  be  shown  to  be  generally  satisfied  for  VP  algorithms. 
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5.3  3-D  Assignment 

5.3.1  Charge 

If  we  label  nodes  on  the  unit  cube  element  (i,j,k)  and  write  the  assignment 
funr  ,on  in  terms  of  its  product  parts 

U,}k(i\x2,x3,t)  =  tv,(x1)w}(x2)wk(i3)  (5.3.1) 

From  Eq. (5.2.5),  the  charge  assigned  to  element  node  (i,j,  k )  where  i,j,  k  =  0 
or  1  from  a  charge  qp  becomes 

Q.jk  =  qpw,(xl)wJ(x2)wk(x3)  (5.3.2) 

where 

te0(x)  =  1  -  x  ;  xvi(x)  =  x  (5.3.3) 

(x',x2,x3)  is  the  displacement  of  the  particle  position  from  the  element  corner 

(0,0,0). 


5.3.2  Current 

Using  the  impulse  approximation  to  particle  motion  as  described  in  [6],  and  the 
factorisation  of  W  reduces,  Eq.(5.2.6)  to  the  contribution  of  a  single  particle 
to 

dsqpAxlw}(. X2  4  sAx2)wk(X3  4  sAx3)  (5.3.4) 

J  J-\ 

The  quadratic  integral  is  evaluated  exactly  by  point  Gaussian  quadrature,  so 
Eq.(5.3.4)  can  be  written  as  the  contribution  from  virtual  particles  at  x±:- 

l$>*  =  [«M*+)“'*(x3)  4  u;;(xl )u>*(x3  )]  (5.3.5) 

where  Ax'  =  x'j  —  x'  are  components  of  the  length  of  the  trajectory  segment 
in  the  current  element,  and  X'  =  (xj  4  x;)/2  are  the  components  of  the  mean 
position.  Indices  j  and  k  take  values  0  or  1.  The  positions  of  the  virtual 
particle  have  coordinates 

*53  (5M» 

The  corresponding  expressions  for  current  components  1^^  and  l3^  are  given 
by  simultaneously  cyclically  rotating  component  indices  (123)  and  node  indices 

Figure  5.3  gives  a  cartesian  space  sketch  of  assignment  according  according 
to  E)q.(5.3.2)  for  charge  and  b)  Eq.(5.3.5)  for  current. 

In  practice,  it  more  convenient  to  evaluate  explicitly  the  integrals  for  /' 
rather  than  use  quadrature.  Equation  (5.3.5)  then  becomes 

%k  =  tMXJ)u,*(*3)  4  (i  -  l-)(k  -  I)^£! 

and  similarly  for  I2  and  I3  by  cyclic  rotation  of  indices. 


(5.3.7) 
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Figure  5.3:  (a)  Charge  assignment  from  a  single  particle  and  (b)  I]  current 
assignment  from  a  single  particle  trajectory  segment  for  linear  potential  test 
functions  in  physical  space. 


5.4  Particle  Motion 


5.4.1  Positions 

The  change  in  particle  position  is  computed  in  the  same  manner  as  the  carte¬ 
sian  case.  The  change  of  position  is  computed  from  the  leapfrog  approximation 

x‘(n  +  l)  =  x*(n)  + *>*("  + 1/2)  (5.4.1) 


to 


dxk 

nr 


(5.4.2) 


where  argument  n  is  used  to  denote  timelevel. 

The  evaluation  of  Eq. (5.4.1)  requires  the  computation  of  vk  from  the  mo¬ 
mentum.  To  avoid  the  explicit  treatment  of  the  Christoffel  symbol  in  the 
momentum  equation,  momenta  are  stored  in  terms  of  their  local  cartesian 
components.  These  components  are  in  units  of  m0c,  where  m0  is  the  rest  mass 
of  the  species  in  question. 

Given  the  cartesian  momenta  p  =  p,x\  the  contravariant  components  are 
given  by 

p,  =  x'.ejp3  (5.4.3) 

or  in  matrix  form 

p  =  TTp  (5.4.4) 


The  relativistic  gamma  is  given  by 


7 


(5.4.5) 


Hence  Eq.(5.4.1)  may  be  written  in  matrix  form 


x(n+ 1)  =  x(n)  +  (T7')  ]p/7  (5.4.6) 


Note  that  Eq.(5.4.6)  is  not  properly  time  centred  unless  T  is  evaluated  at  the 
half  timelevel,  in  which  case  the  equation  becomes  implicit.  In  the  first  in¬ 
stance,  we  shall  use  the  explicit  approximation,  although  this  can  be  refined  by 


( 
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using  some  simple  predictor-corrector  on  Eq. (5.4.6),  or  by  solving  the  cartesian 
space  form  of  Eq.(5.4.1)  and  transforming  the  cartesian  coordinate  to  curved 
space. 

5.4.2  Momenta 

The  finite  element  approximation  in  space  and  time  for  the  particle  accelera¬ 
tions  leads  to  unwieldy  expressions;  consequently,  we  have  replaced  the  time 
derivative  by  a  finite  difference.  Treating  the  momentum  equation  in  curved 
space  introduces  the  Christoffel  symbol  Fj^  in  the  quadratic  centripetal  term: 


~  r‘km*mPi  =  Fk  =  q(Ek  +  eklmv‘bm) 


(5.4.7) 


A  more  straightforward  approach  is  to  use  the  cartesian  leapfrog  scheme 
for  the  momentum 

^=«(E  +  vxB)  (5.4.8) 

The  local  cartesian  components  of  the  electric  and  magnetic  fields  can  be  found 
using  the  contravariant  basis  vectors.  Using  matrix  notation  and  T  as  defined 
in  Eq.(5.4.4),  the  cartesian  electric  field  is  related  to  the  covariant  components 
by 

£  =  T-*£  (5.4.9) 

and  the  cartesian  magnetic  field  components  are  given  by 


B  =  Tr6/def(T) 


(5.4.10) 


With  the  choice  of  field  units 


(5.4.11) 


the  resulting  leapfrog  approximation  to  Eq.(5.4.8)  becomes 
p(n  +  1/2)  -  p(n  -  1/2)  =  2E  +  (p(n  +  1/2)  -I-  p(n  -  1/2))  x  B/7  (5.4.12) 
which  is  readily  solved  using  standard  methods  [12,  Chap  4). 


44 


RFFX(92)52 


General  Geometry  MIMD  PIC 


Chapter  6 


Boundary  Conditions 


This  chapter  summarises  the  application  of  interior  and  exterior 
boundary  conditions  to  both  the  field  equations  and  the  particle 
equations. 


6.1  Introduction 


The  field  boundary  conditions  fall  into  two  classes:  interior  and  exterior.  In¬ 
terior  boundary  conditions  are  those  where  the  computational  domain  is  ex¬ 
tended  to  complete  the  assembly  of  Ampere’s  equation  at  boundary  nodes. 
Instances  of  the  interior  boundary  conditions  are  periodic,  Neumann  (sym¬ 
metry)  and  glue  patch  boundaries.  Their  application  involves  adding  some 
‘external’  contribution  to  dt cT  from  elements  of  the  same  form  as  those  in  the 
body  of  the  computational  domain.  Interior  field  boundary  conditions  are 
treated  in  Section  6.2. 

Exterior  field  boundary  conditions  (Section  6.3)  couple  the  Ampere’s  equa¬ 
tion  for  boundary  nodes  to  external  circuit  equations  relating  surface  currents 
to  surface  fields.  The  simplest  external  circuit  is  the  perfect  conductor,  where 
tangential  electric  fields  are  zero.  If  nonzero  tangential  fields  are  applied  at  the 
conducting  wall  boundary,  then  one  obtains  the  fixed  applied  field  boundary 
condition.  Slightly  more  involved  is  the  resistive  wall,  which  differs  from  the 
perfect  conductor  in  that  tangential  electric  and  magnetic  fields  are  related 
by  a  nonzero  surface  impedance.  More  general  external  circuit  couplings  at 
boundaries,  involving  inductance  and  capacitance  lead  to  differential  equations 
relating  surface  electric  fields  to  surface  currents. 

Particle  boundary  conditions  may  be  likewise  classed  as  interior  or  exterior. 
Interior  conditions  follow  the  same  classification  as  the  fields,  and  exterior 
conditions  are  either  particle  emission  or  absorption.  These  are  treated  in 
more  detail  in  Section  6.4 
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6.2  Fields  at  Interior  Boundaries 

For  all  cases,  the  application  of  interior  field  boundary  conditions  involves 
assembling  contributions  to  d  (Eq. (4.6.5))  across  adjacent  block  boundaries. 
This  assembly  is  performed  by  adding  in  the  gluepatch  contributions.  Figure 
6.1  illustrates  the  gluepatch  data  transfer.  Data  stored  on  the  logical  (t,j,  it) 
lattice  of  nodes  in  block  1  are  copied  to  the  gluepatch,  and  then  data  is  added 
from  the  gluepatch  to  the  corresponding  node  on  block  2.  On  both  source 
and  target  blocks  are  reference  points  (solid  circles  and  crosses  in  Figure  6.1) 
which  specify  the  location  and  orientation  of  the  blocks  with  respect  to  the 
glue  patch.  The  reference  points  are  used  to  perform  the  field  component  and 
indexing  transformations  between  blocks. 


gluepatch 


Figure  6.1:  All  interior  boundary  conditions  are  applied  by  passing  data  between 
blocks  via  gluepatches. 

Two  types  of  field  gluepatch  are  required: 

Surface  patch  to  exchange  the  two  tangential  field  components  with  nodes 
common  to  the  faces  of  two  blocks. 

Line  patches  to  exchange  the  one  tangential  field  component  with  nodes 
common  to  the  edges  of  four  or  more  blocks. 

Interior  boundary  conditions  are  applied  by  adding  the  gluepatch  values 
<^/ue  to  the  corresponding  node  accumulator 

*  :=  +  4-  (6-2.1) 

A  simple  example  of  the  use  of  gluepatches  is  to  apply  doubly  periodic 
boundary  conditions  to  a  2-D  rectangular  domain.  Four  gluepatches  are  re¬ 
quired:  two  surfaces  patches  to  connect  the  north  to  south  boundary  and 
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the  east  to  west  boundary,  and  two  line  patches  to  connect  the  NE  to  SW 
corner  and  the  SE  to  NW  corner.  Gluepatch  exchanges  are  required  twice 
each  timestep,  once  for  di  to  complete  the  integration  of  Ampere’s  equation, 
and  once  for  E,  to  complete  the  computation  of  E  from  d  in  evaluating  the 
constitutive  relationship. 


6.3  Fields  at  Exterior  Boundaries 

Exterior  boundary  conditions  are  applied  to  blocks  through  boundary  con¬ 
dition  patches,  which  are  stuck  onto  the  blocks  in  the  same  manner  as  the 
gluepatches,  only  now  they  connect  to  external  conditions  rather  than  to  an¬ 
other  block. 

If  we  let  Et  =  — n  x  (n  x  E)  be  the  (physical)  electric  field  tangential  to  the 
boundary  surface  whose  unit  normal  is  n,  then  exterior  boundary  conditions 


may  be  summarised  as  follows:- 

E<  =  E  :  specified  applied  field 

(6.3.1) 

E(  =  0  :  perfect  conductor 

(6.3.2) 

Et  =  Z\,  resistive  wall 

(6.3.3) 

Ei  =  E((j.)  :  circuit  equation 

(6.3.4) 

E  is  the  (given)  applied  field,  Z  is  a  given  surface  impedance,  and  j,  (curl  (H)) 
is  the  surface  current.  The  circuit  equation  case  is  in  general  a  differential 
equation  in  time  relating  E<  to  j,  of  which  Eqs.(6.3.1-6.3.3)  are  special  cases. 
For  the  present  discussion  we  shall  limit  ourselves  to  the  three  special  cases 
listed. 

The  ease  with  which  the  boundary  conditions  can  be  applied  depends  on 
the  nature  of  the  finite  element  not  at  the  boundary  surface.  We  identify  four 
different  cases 

•  orthogonal  nets 

•  boundary  orthogonal  nets 

•  surface  orthogonal  nets 

•  general  curvilinear  nets 

The  first  three  cases  usually  lead  to  local  surface  equations  for  the  fields 
which  are  explicit,  whilst  the  fourth  leads  to  all  implicit  equations  which  has 
either  to  be  solved  by  iteration,  or  lumped  further  to  give  explicit  expressions 
for  displacement  fields. 

Boundary  conditions  are  applied  to  Ampere’s  equation  through  prescribed 
(contravariant)  displacement  fields  and/or  displacement  currents  at  boundary 
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nodes.  Boundary  conditions  on  Faraday’s  equation  are  applied  by  prescribing 
the  corresponding  (covariant)  electric  fields. 

We  shall  consider  the  application  of  boundary  conditions  to  surfaces  per¬ 
pendicular  to  basis  vector  e1,  as  illustrated  in  Figure  6.2. 


Figure  6.2:  Basis  vectors  e2  and  e3  lie  on  surface  perpendicular  to  reciprocal 
basis  vector  e1 


6.3.1  Applied  Field 

The  applied  field  boundary  condition  sets  the  surface  tangential  electric  field 
to  some  specified  value  E,  so  that  at  the  surface. 

n  x  (E  -  E)  =  0  (6.3.5) 

For  the  face  illustrated  in  Figure  6.2,  n  =  e'/l*1 1>  80  Eq. (6.3.5)  yields  e1  x 
(E  -  E)  =  e1  x  e'(E,  -  £<) 

=  -^=(e3(E2  -  E2)  -  e2(E3  -  £3))  (6.3.6) 

V9 

Taking  the  dot  product  of  Eq.(6.3.6)  with  eJ  and  e3  gives  the  covariant  field 
component  boundary  conditions 


£2  =  E2  ;  £3  =  E3 


(6.3.7) 


In  the  continuum  limit,  Eq.(6.3.6)  can  likewise  be  written  in  terms  of  the 
contravariant  fields: 


(6.3.8) 
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For  the  boundary  orthogonal  case,  e1  x  ej  =  0,  and  Eqs. (6.3.8)  reduce  to 
simple  Dirichlet  conditions  for  <P  and  d3: 


,  e  ,  * 

«>/?  ty/9 


(6.3.9) 


One  possible  approach  to  handling  the  boundary  conditions  in  the  simula¬ 
tion  code  is  to  use  Eqs. (6. 3. 8)  in  a  pointwise  fashion  at  boundary  nodes.  A 
more  consistent  approach  is  to  use  Eqs.(6.3.7)  for  the  covariant  fields,  and  the 
lumped  finite  element  constitutive  relations  to  solve  for  d2  and  <P  at  boundary 
nodes:- 

Ej  —  Gjid1  +  Gad2  +  Gi3<P  (6.3.10) 

E3  =  Gad'  +  Gud1  +  Gad3  (6.3.11) 

For  boundary  orthogonal  elements  (G?  1  =  G31  =  0),  conditions  of  the  form 
Eqs.  (6.3.9)  may  again  be  recovered. 

For  surface  orthogonal  elements,  Gu  =  0,  and  Eqs.(6.3.10  and  6.3.11) 
reduce  to 

d1  =  G^(Ei-Gnd')  (6.3.12) 

<P  =  G^(E3~  Gud)  (6.3.13) 

In  two  dimensions  (where  the  3  coordinate  is  negligible,  say,  and  Gj 3  =  Gn  = 
0),  explicit  equations  of  the  form  Eqs.(6.3.12)  -  (6.3.13)  can  always  be  found. 

For  general  curvilinears,  Eqs. (6. 3. 10  and  (6.3.11)  lead  to  the  matrix  equa¬ 
tions  for  d1  and  <P 

(Gn  —  GuG^Gi^d1  —  (E,  —  GaG^Bi)  +  Gn  —  GuG^Gn)d'  (6.3.1 1) 

(G33  —  GuG^Ga)^  =  (E3  —  GhGjj  E2)  +  (G31  —  G32G22  G21  )dl  (6.3.15) 

which  are  discrete  analogues  to  Eqs.(6.3.8). 


•  < 


6.3.2  Perfect  Conductor 

Perfect  conductor  boundary  conditions  differ  from  the  applied  field  case  only 
in  that  the  applied  field  is  zero. 


6.3.3  Resistive  Wall 

The  resistive  wall  boundary  condition,  applicable  for  small  skin  depth  surfaces, 
relates  the  surface  current  j,  to  the  tangential  electric  field  at  the  surface: 


n  x  H  =  j. 


(6.3.16) 


-  n  x  (n  x  E)  =  Zj,  (6.3.17) 

For  the  surface  shown  in  Figure  6.2,  n  =  e1/  |  e1  |,  and  Eq.(6.3.17)  gives 

e1  x  (E  -  Zj,)  (6.3.18) 


j 

j 
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Following  the  reasoning  given  above  for  the  applied  field  case,  the  surface 
covariant  fields  satisfy 

£3  =  Zj„  ;  £3  =  Zj, 3  (6.3.19) 

The  corresponding  equation  to  Eq.(6.3.8)  is 

d2  -  tZJ2  =  d1  ^  d3  -  eZJ3  =  d1  ^-4  (6.3.20) 

j  e1  |  |  e1  | 

and  Eq. (6.3.14)  is  replaced  by 

(G22  -  G»G3G»)(*  -  tZl 2)  =  (G21  -  G23G^ G3i  )d*  (6.3.21) 

The  evolutionary  equation  for  d2  is  obtained  by  using  Eq. (6.3.21)  to  eliminate 
the  surface  current,  I2,  from  the  finite  element  assembled  equation  for  the 
surface  nodes. 

6.4  Particles  at  Interior  Boundaries 

Each  field  surface  gluepatch  has  a  corresponding  particle  gluepatch.  Particles 
passing  through  a  gluepatch  from  a  source  block  to  a  target  block  are  passed 
from  the  storage  areas  of  the  source  block  particle  tables  to  that  of  the  target 
block  via  the  gluepatch  buffer. 

Position  coordinates  are  transformed  from  the  curved  space  components 
of  the  source  block  to  those  of  the  target  block.  Momentum  coordinates  are 
converted  from  the  local  cartesian  coordinates  of  the  source  block  to  those  of 
the  target  block. 

6.5  Particles  at  Exterior  Boundaries 

Particles  are  lost  from  the  simulation  domain  by  absorption  at  exterior  bound¬ 
aries,  and  are  introduced  by  emission  at  boundaries. 

Emission  is  determined  by  additional  physical  models.  For  example: 

Thermionic  emission,  where  surface  electric  fields  draw  electrons  out  of  a 
cathode  surface  boundary  layer. 

Secondary  electrons,  whose  distribution  is  determined  by  impacting  parti¬ 
cles  and  boundary  material  properties. 

beam  injection,  where  external  sources  determine  density  and  momentum 
distributions. 


Chapter  7 

Data  Organisation 


This  Chapter  outlines  the  data  organisation.  A  multilevel  tabular 
approach  is  proposed  in  order  to  obtain  efficient  MIMD  implemen¬ 
tation  whilst  maintaining  flexibility. 


7.1  Introduction 

The  data  organisation  problem  to  be  resolved  is  how  to  map  a  multiblock  de¬ 
composition  of  a  complex  microwave  device  onto  a  distributed  memory  MIMD 
computer  such  as  the  Intel  iPSC  or  a  Meiko  i860/Transputer  MIMD  comput¬ 
ers.  The  objective  is  to  maximise  program  portability  and  flexibility  wthout 
compromising  efficiency. 

The  principal  unit  of  decomposition  is  the  uniblock  -  which  carries  both 
field  and  particle  data  from  a  volume  of  space.  To  handle  complex  shapes, 
some  blocks  will  need  to  be  small,  whereas  for  simple  shapes,  physical  bound¬ 
ary  condition  constraints  allow  large  blocks.  If  desired,  large  blocks  can  be 
subdivided  to  facilitate  mapping  onto  the  MIMD  computer.  The  data  organi¬ 
sation  must  also  allow  several  small  blocks  to  be  collected  together  on  a  given 
process  to  obtain  effective  load  balance. 

Section  7.2  illustrates  the  multiblock  decomposition  for  a  coaxial  to  cylin¬ 
drical  to  rectangular  waveguide  junction.  The  global  and  local  data  organisa¬ 
tion  required  for  such  a  decomposition  is  treated  in  Section  7.3.  Section  7.4 
proposes  a  data  storage  within  the  block  which  allows  metric  data  compression 
for  blocks  with  symmetry  and  orthogonality  properties. 


7.2  The  Mesh 


Figure  7.1  shows  the  multiblock  representation  of  a  cylindrical  device  with  an 
attached  waveguide.  Each  block  is  subdivided  into  elements,  and  the  same 
parameterisation  is  used  for  the  coordinate  transformation  as  is  employed  for 
the  scalar  potential  (cf  Chapter  3). 
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Figure  7.1:  An  illustration  of  the  meshing  of  the  junction  of  a  coaxial  and  rect¬ 
angular  guide.  There  are  six  blocks: the  rectangular  guide,  the  central  cylinder 
and  four  in  the  annulus. 

The  multiblock  subdivision  of  the  computational  domain  minimises  the 
amount  of  global  data  and  interprocessor  message  passing,  and  simplifies  load 
balancing  across  processors.  Each  slave  block  only  requires  data  from  its  neigh¬ 
bours,  and  the  master  control  program  only  requires  information  about  the 
block  surfaces  (‘glue  patches’)  which  join  the  slave  blocks  together.  This  ar¬ 
rangement  offers  the  prospects  of  large  computational  intensity  and  a  weak 
Amdahl  limit  to  speedup  on  distributed  memory  MIMD  computers.  More¬ 
over,  the  simple  logical  cube  addressing  within  each  block  leads  to  fast  serial 
processing. 

Without  compromising  the  subdivision  of  the  spatial  mesh  into  blocks  to 
get  efficient  MIMD  processing,  one  can  further  demand  that  boundary  con¬ 
ditions  only  apply  at  the  surfaces  of  blocks;  this  eliminates  the  addressing 
problems  in  embedding  surfaces  within  blocks,  and  allows  surface  data  to  be 
passed  to  the  control  program  through  the  ‘glue  patch’  tables.  Further  saving 
of  computer  storage  and  time  arise  from  keeping  metric  information  and  ma¬ 
terial  property  data  only  in  those  blocks  where  they  are  needed.  When  many 
small  blocks  are  used  to  describe  a  complex  object,  load  balancing  is  achieved 
by  assigning  several  blocks  to  one  processor. 

In  summary,  the  efficient  meshing  strategy  uses 

•  A  multiblock  spatial  decomposition  where  segments  of  target  surfaces 
and  other  boundaries  are  coincident  with  block  surfaces. 
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Figure  7.2:  The  curved  space  schematic  of  the  device  in  Figure  7.1.  Blocks 
transform  to  rectangular  bricks  connected  by  gluepatches  (indicated  by  arrows). 

•  Indirect  (‘glue  patch’)  addressing  between  blocks,  and  logical  cube  (i,j,k) 
element  nets  within  blocks. 

•  Transfinite  interpolation  to  divide  the  curvilinear  hexahedral  multiblocks 
into  finite  elements 

Figure  7.2  gives  a  ‘curved  space’  schematic  of  the  device  illustrated  in  Fig¬ 
ure  7.1  with  an  additional  multiblock  extension  to  the  rectangular  waveguide. 
The  bottom  five  blocks  correspond  to  the  central  cylinder  and  annulus.  The 
arrows  indicate  block  faces  connected  by  gluepatches.  Block  6  is  the  connec¬ 
tion  from  the  annulus  (block  2)  to  rectangular  guide  (blocks  7-10).  Blocks 
7-10  have  been  included  to  illustrate  the  need  for  extra  line  gluepatches  for 
the  field  solver  when  more  than  three  block  edges  meet  at  a  line. 

Figure  7.3  summarises  the  timestep  loop  operations  for  a  general  uniblock. 
The  uniblock  (Figure  7.3(a))  has  regular  mesh  of  unit  spacing  (in  curved 
space).  Its  surface  is  either  covered  by  boundary  patches  or  is  connected 
to  surface  glue  patches  (plus  line  patches  at  edges).  All  boundary  conditions 
and  exchanges  of  particles  and  fields  between  uniblocks  we  handled  via  the 
gluepatches  and  their  buffers.  If  dynamical  load  balancing  is  excluded,  the 
only  interprocess  communication  required  is  the  exchange  of  gluepatch  buffer 
information  between  pairs  of  contiguous  blocks. 

Figure  7.3(b)  gives  the  steps  of  the  timestep  loop  for  the  uniblock.  The 
first  part  gives  the  updated  particle  momenta,  positions  and  currents,  and 
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Figure  7.3:  (a)  the  uniblock  surface  is  covered  by  boundary  or  glue  patches,  (b) 
the  timestep  loop  for  a  uniblock. 

exchanges  particles  with  neighbouring  blocks  and  boundaries.  This  part  is 
complete  only  when  all  the  particle  exchange  buffers  are  empty.  The  second 
part  is  the  field  solver,  where  the  update  of  Ampere’s  equation  leads  to  the 
exchange  of  displacement  fields,  and  the  update  of  Faraday’s  equation  leads  to 
the  exchange  of  electric  fields.  The  inner  loop  over  the  boundary  conditions 
is  relevant  only  where  necessary  for  the  three  dimensional  general  geometry 
case. 


7.3  Data  Addressing 

Figure  7.4  summarises  the  global  addressing  and  division  of  data  in  global  and 
local.  The  MIMD  virtual  particle  simulation  program  treats  a  microwave  de¬ 
vices  as  a  collection  of  uniblocks  connected  together  by  a  network  of  gluepatches, 
Each  gluepatch  is  glue  to  two  connecting  block  faces  (or  block  edges  in  the 
case  of  line  patches),  so  that  the  complete  network  can  be  described  by  stating 
the  positioning  of  the  two  faces  of  each  gluepatch  on  the  uniblocks. 

There  are  four  levels  in  the  global  addressing  of  this  network:  the  proces¬ 
sor,  process,  block  and  patch: 

Processor:  Since  we  want  the  same  program  to  run  with  the  minimum 
changes  on  many  different  parallel  computers,  a  distinction  is  made  between 
the  logical  processes  used  in  the  program  and  the  physical  processors  of  the 
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Figure  7.4:  Data  Addressing 


MIMD  computer.  The  translation  between  logical  program  processes  and 
physical  processors  can  then  be  kept  to  the  outermost  level,  where  the  pro¬ 
gram  is  interfaced  with  the  particular  parallel  computer  on  which  it  is  to  run. 
Although  there  is  a  one-to-one  correspondence  between  a  logical  process  and 
a  physical  processor,  the  numbering  of  the  processors  and  their  physical  con¬ 
nectivity  (e.g.  hypercube  or  mesh)  is  highly  system-dependent.  Subroutines 
and  tables  will  be  provided  to  translate  rapidly  between  the  above  logical  and 
physical  numbers.  By  this  method  we  can  move  to  a  new  MIMD  computer  by 
making  a  few  changes  to  a  small  number  of  tables  and  subroutines,  without 
altering,  in  any  way,  the  bulk  of  the  complex  simulation  program.  The  pro¬ 
gram  is,  therefore  written  in  terms  of  processes  which  may  be  thought  of  as 
logical  processors,  and  which  are  translated  to  physical  processor  numbers  by 
these  tables. 


Process:  Several  blocks  may  be  computed  on  each  processor  of  a  MIMD 
computer,  and  the  balancing  of  the  computational  load  between  the  processors 
is  achieved  by  moving  blocks  between  processors,  until  the  computational  load 
is  roughly  equal  on  all  processors.  Remembering  that  the  program  is  to  be 
written  in  terms  of  processes,  load  balancing  can  be  achieved  by  moving  blocks 
between  processes.  The  distribution  of  the  computation  across  the  parallel 
computing  resource  is  then  described  by  stating  which  blocks  are  on  which 
processes. 
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Block:  The  block  addressing  points  to  the  global  block  data  -  its  location 
in  space  relative  to  some  global  coordinates,  blocktype  and  global  particle 
attributes.  The  block  type  data  will  contain  metric  coefficients  needed  by  the 
field  solver  and  contravariant  basis  vectors  used  by  the  particle  integrator. 

Patch:  All  the  exchange  of  data  between  blocks  will  be  performed  through 
an  exchange  patch  subroutine.  This  must  recognise  whether  a  target  patch  to 
which  it  is  to  send  data  is  a  patch  on  a  block  belonging  to  its  process,  or  on  a 
block  being  computed  by  another  process.  In  the  former  case,  a  memory-to- 
memory  copy  of  the  data  is  performed,  and  in  the  latter  a  message  is  sent  to 
the  target  process. 

Two  way  addressing  between  the  four  levels  is  proposed  to  simplify  coding; 
blocks  can  be  addressed  by  process  and  process  by  blocks,  and  similarly  for 
processor/process,  and  block/patch.  Direct  addressing  links  are  summarised 
by  the  arrows  in  Figure  7.4. 


7.4  Uniblock  Data  Storage 

Particle  and  mesh  data  in  each  uniblock  will  be  stored  in  a  similar  fashion  to 
that  described  in  [5]  Particle  coordinates  are  grouped  by  species,  and  mesh 
data  is  mapped  onto  one  dimensional  arrays. 

The  nodes  for  the  mesh  data  are  indexed  as  follows 

•  Each  block  of  the  multiblock  decomposition  of  the  computational  domain 
comprises  n,  x  n2  x  n3  elements. 

•  The  active  block  domain  comprises  elements  numbered  (0,  0,  0)  to  ( ni  - 

1,  n2  -  1 1  n3  -  1)- 

•  The  nodes  in  the  active  block  are  labelled  (i,j,  k)  =  (0,  0,  0)  to  (nj,n2,  n3). 
Where  necessary  (ie  on  the  i  =  i»i,j  =  nj  and  k  =  n3  planes),  an  extra 
padding  layer  of  elements  is  introduced. 

•  Optionally,  there  may  be  extra  bordering  layers  of  elements  introduced 
onto  the  block.  The  depth  of  the  border  is  (Z&i,/^,/^)  in  the  ( i,j,k ) 
directions  respectively. 

The  meshing  of  the  block  is  illustrated  in  Figure  7.5.  There  are  N„  = 
( na  +  2/6q  +  1)  elements  and  nodes  in  the  a  direction. 

The  total  number  of  data  values  stored  for  each  scalar  field  (or  vector 
component)  is 

iV|  x  JVj  x  JV3  (7.4.1) 

The  node  locations  on  each  element  are  sketched  in  Figure  7.6  and  summarised 
in  Table  7.1.  The  locations  are  the  positions  of  the  nodes  for  the  unit  cube 
element  in  curved  space. 
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Figure  7.5:  A  2-D  illustration  of  the  meshing  of  a  block.  The  field  is  solved 
in  the  active  domain,  the  padding  layer  is  used  to  simplify  addressing,  and 
provision  for  a  bordering  layer  is  made  for  possible  future  use  in  particle  mesh 
applications. 
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Figure  7.6:  The  location  of  nodes  on  the  unit  cube  element.  Crosses  give  E,, 
/’  and  <t  node  locations,  open  circles  give  Hi  and  b'  locations  and  solid  circles 
give  position  and  scalar  potential  node  locations. 
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Table  7.1:  Node  locations  and  active  index  ranges 


components 

locations 

active  index  range  of  nodes 

d 

(Z1) 

l*00](*01)(J10)($ll) 

(0,  n,  —  1)  (0,n2)  (0,n3) 

Ei 

<P 

(P) 

[0*0](0ll)(ll0)(l|l) 

(0,  r»i )  (0,n2-l)  (0,  n3) 

E3 

<P 

(P) 

[00i](0ll)(10i)(lli) 

(OjlHlft) 

lj  j0](|  J1) 
[000](100)(010)(001) 
(111)(011)(101)(110) 

(0,n,)  (0,n2)  (0,  n3  —  1) 

Hx 

bl 

(0,n,)  (0,  n2  —  1)  (0,n3  —  I) 

Hi 

IP 

(0,  n,  -  1)  (0,  n2)  (0,  n3  —  1) 

Hz 

b3 

(0,n,  -  1)  (0,  n2  —  1)  (0,n3) 

(0,  n,  —  1)  (0,n2  —  1)  (0,  n3  —  1) 

Only  those  nodes  with  the  smallest  indices  are  deemed  to  belong  to  the 
element;  in  the  Fortran  implementation  these  nodes  have  the  same  indexing  as 
the  element.  In  Table  7.1,  the  node  belonging  to  the  current  element  is  shown 
in  square  braces  [...],  the  remaining  locations  are  nodes  on  its  boundaries 
belonging  to  neighbouring  elements  -  hence  the  use  of  a  padding  layer. 

To  simplify  the  coding  of  the  cyclic  interchange  of  indices  and  of  compressed 
storage  of  basis  vectors  and  metrics,  the  multidimensional  elements  described 
above  will  be  mapped  onto  one  dimensional  arrays  in  the  Fortran  coding.  The 
mapping  will  be  performed  as  follows:- 

•  By  (arbitrary)  choice,  the  components  of  a  vector  field  are  stored  as  three 
successive  scalar  fields.  (This  enforces  D  —  1  below). 

•  All  three  dimensional  scalar  fields  are  explicitly  mapped  onto  one  dimen¬ 
sional  arrays  as  follows:- 

Element  i,  j,  k  is  stored  in  location 

loc(ijk)  =  fl  +  ix/,+;  x  l2  +  k  x  /3  (7.4.2) 

where 

n  =  mesh  origin 

l\  =  D  —  dimension  (1  for  scalar,  3  for  triplet,  etc) 
h  =  h  x  TV, 

I3  —  lj  x  N 2 

Na  =  na  +  1  +  2  lba 

The  origin  is  chosen  so  that  (i,j,  k )  corresponds  to  the  element  with  its 
corner  at  the  origin  of  the  active  domain.  Thus,  to  avoid  overwriting 
before  the  first  element  of  the  array  the  origin  H  must  satisfy 

f!  >  Hmin  =  1  +  lb\  x  /,  +  U>2  x  lj  +  163  x  [3  (7.4.3) 
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•  If  vectors  are  stored  as  successive  components,  then  the  successive  mesh 
origins  are  at 


fii 

=  n1  +  NlNiNa 

n3  =  n2  +  MNj7v3 

The  advantages  of  this  data  storage  method  are 

1.  it  allows  the  same  code  to  be  used  for  1,2  and  3  dimensional  cases. 

2.  the  same  code  is  applied  to  all  three  components  by  using  an  outer  loop 
which  cyclically  loops  through  components. 

3.  by  defining  separately  the  increments  for  the  metric  tensor  components 
(and  for  basis  vectors  in  the  particle  acceleration  computation),  com¬ 
pressed  storage  of  metrics  can  be  employed. 

The  storage  advantage  of  the  last  item  becomes  apparent  when  comparing 
storage  of  Gu,  say,  for  the  extreme  cases  of  a  uniform  cartesian  block  and  a 
general  curvilinear  block  in  3-D:  The  former  requires  one  floating  point  number 
to  be  stored,  whilst  the  latter  requires  rij  x  n2  x  n3. 
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D  Annex  3:  Running  mimdpic  on  the  iPSC 


LPM2  BENCHMARK 


(Intel  native  communications) 

Roger  Hockney  and  James  Eastwood 
May  1993 


First  Read  the  directory  picpac2.d  (7.5MB)  from  the  Sun  tar  cartridge  tape 
to  the  computer  file  system,  with  a  UNIX  command  like  (use  150MB  drive) : 

tar  xvf  /dev/rstO  picpac2.d  ...  on  a  SUN 

or  tar  xvf  /dev/rmtO  picpac2.d  ...  on  an  IBM 

The  following  is  the  ReadMe  file  in  directory  picpac2.d 


DESCRIPTION 


The  Local  Particle  Mesh  (LPM2)  Benchmark  was  written  for  the  USAF  to  measure 
the  parallelization  properties  of  the  new  Electro-Magnetic  PIC  code  for  the 
simulation  of  MILO  type  devices  on  massively  parallel  computers  (MPPs), 
such  as  the  Intel  iPSC/860  and  Intel  Paragon. 

This  version  uses  the  Intel  native  communication  library  (csend,  crecv  etc) . 
Conversion  to  other  systems  can  be  made  by  placing  appropriate  alternative 
communication  calls  in  subroutines  NODASG  (clsl2z.f),  XPATCH  (c2s31z.f)  and 
benctl.f  only. 

The  program  can  be  run  with  or  without  graphics.  Simple  graphics  is  provided 
to  use  as  a  demonstration  and  as  a  check  on  the  validity  of  the  calculation. 
The  geometry  of  the  device  is  drawn  before  the  timing  period,  with  the  region 
computed  by  each  process  shown  in  a  different  color  (although  there  are  only 
four  colors  which  are  cycled  through) .  Only  the  boundaries  of  the  blocks  are 
drawn,  because  the  mesh  itself  is  too  fine.  After  the  end  of  the  timing 
interval,  all  the  particles  are  drawn  (in  the  appropriate  color  for  each 
process)  on  top  of  the  device  diagram.  Printed  output  of  the  benchmark 
timing  data  and  performance  is  sent  to  the  screen  and  also  to  an  output  file. 

The  device  is  made  up  of  blocks,  and  parallelization  is  achieved  by  assigning 
blocks  to  processes  within  the  program  (NODASG  and  SETDMP),  and  external  to 
the  program  by  assigning  processes  to  processors  (or  nodes) .  The  program 
is  designed  to  allow  dynamic  allocation  of  blocks  to  processes,  but  the 
benchmark  program  uses  a  fixed  allocation.  Allocation  of  processes  to 
processors,  depends  on  facilities  provided  by  the  parallel  computer.  In  this 
version  for  the  Intel  computers,  we  assume  one  process  to  each  processor. 

Interfacing  the  graphics  to  a  new  computer  system  is  tricky,  and  we  recommend 
that  benchmark  times  be  made  without  the  graphics  display.  The  program 
has  in  internal  check  on  the  validity  of  the  calculation,  based  on  comparing 
the  total  number  of  particles  at  the  end  of  the  benchmark  run  with  a 
reference  number  (obtained  from  a  valid  one  processor  calculation) .  If  these 
numbers  agree  to  better  than  10%,  then  the  calculation  is  regarded  as  valid. 
Exact  agreement  cannot  be  expected  because  of  the  use  of  random  numbers  for 
particle  injection.  There  seems  to  be  no  way  of  ensuring  that  a  multi-process 
run  uses  the  same  random  numbers  for  the  same  purposes  as  a  single-process 
run,  therefore  we  can  only  expect  approximate  agreement  on  a  statistical 
basis. 


CASES 


This  initial  benchmark  tape  provides  for  two  cases: 


easel:  a  small  MILO  problem  with  5  cavities  and  12  blocks  and  about  ^ 

100  particles,  which  has  been  used  as  a  test  calculation  during 
program  development.  It  is  small  enough  to  run  on  any  workstation 
and  give  a  reasonable  real-time  display  when  run  on  one  processor. 

It  is  too  small,  however,  to  make  sensible  use  of  a  massively 
parallel  computer,  and  the  speedup  behavior  will  be  disappointing. 

case2:  a  moderately  sized  problem  with  31  cavities  and  64  blocks,  * 

and  about  12000  particles,  which  should  be  big  enough  to  show 
reasonable  speedup  behavior  on  a  massively  parallel  computer. 

a  further  case  is  planned: 

case3:  (to  be  created  after  further  knowledge  of  and  some  experience  • 

with  the  USAF  MPP)  a  massive  problem  tuned  to  make  the  most  of 
the  massively  parallel  computer  being  used. 


DIRECTORIES 


The  following  directories  are  provided  on  the  benchmark  tape,  only  the 
first  is  needed  for  benchmarking 

picpac2.d  (7.4MB)  LPM2  benchmarking  directory,  for  iPSC 

without  graphics 

The  following  may  be  used  to  demonstrate  the  program  on  a  Sun 

picsim.d  (8.6MB)  LPM2  benchmarking  directory,  for  Sun 

simulator  and  on-line  (xgenie)  graphics 

the  following  directories  are  not  needed  for  elementary  benchmarking  and 
can  usually  be  ignored.  They  may  occasionally  be  needed  to  get  out  of 
trouble : 


ipscsim.d  (4.5MB) 
lpmlibm.d  (2.1MB) 
XGENIE  (1.6MB) 
XGENIE. NEW (1.3MB) 


iPSC  simulator 
LPM1  benchmark 

source  for  xgenie  graphics  (not  multiplexed) 
ditto  for  multiplexed  graphics 


OPERATING  INSTRUCTIONS  (benchmark  without  graphics  -  UNIX  commands) 


The  files  on  tape  are  setup  for  benchmark  runs  on  an  Intel  iPSC/860,  which 
may  be  run  as  follows: 

(1)  cd  picpac2.d  -  go  to  benchmark  directory 


(2) 

cp  casel.dat  input 

.  dat 

(3) 

get cube  -tl 

SB 

(4) 

sh  host.sh 

S> 

(5) 

-  results 

to  screen  and 

(6) 

vi  input.dat 

-  edit  line-2 

change  last 

(7) 

getcube  -t2 

-  -t2  gets  2 

(8) 

sh  host 

copy  easel  1-proc  data  to  input.dat 
which  is  used  as  a  working  input  file 
get  a  cube  of  1-processor 
load  and  start  running 
file  o_caselpl:l 

to  set  NPRES*2,  for  2  processes 
character  to  number  of  processes 
nodes 


-  • 


(9) 


-  results  to  screen  and  files: 

o_caselp2:l  (process  1) 
o_caselp2:2  (process  2) 

(10)  repeat  steps  (6)  to  (9)  for  at  least  10  different  numbers  of  processes 
(-NPRES),  roughly  equally  spaced  logarithmically: 

e.g.  3,4,5,6,8,10,12,16,20,24,32,40,50,64  on  64-node  iPSC 
The  separate  output  files  for  each  process  are  generated  automatically 

o_case<n>p<m> :<r> 

output  for  process  <r>  from  <m>-process  run  of  case<n> 

(11)  cp  case2.dat  input.dat  -  bring-in  case2  data  for  1-proc  run 

(12)  repeat  steps  (3)  to  (10)  using  case2  data 

Example  output  can  be  found  in:  picpac2 . d/pac_o_*  and  picsim.d/o_* 

BENCHMARKING  COMPLETE  (easel  and  case2) 


RECOMPILATION 


The  benchmark  directory  contains  executables  which  have  run  on  an  iPSC/860, 
and  should  be  able  to  be  used  as  described  above.  However  if  they  do  not 
work,  or  if  it  is  desired  to  recompile  and  link  at  a  different  level  of 
optimization,  new  executables  can  be  produced  as  follows,  using  the  UNIX 
makefile  facility.  The  following  makefiles  are  provided: 


makefile 
makefile . iPSC 
makefile . sim 

to  use  any  of  these: 


-  the  working  makefile 
=  to  make  executables  for  an  iPSC/860 
*  to  make  executables  to  use  with  the  Intel 
iPSC  simulator,  running  on  a  SUN  workstation 


cp  makefile . iPSC  makefile  -  setup  for  real  iPSC 

make  *  recompile  and  link  all  changed  files 

and  produce  executable  called  'node' 

The  instructions  above  run  a  shell  script  called  'host.sh'  to  load  the 
executable  'node'  onto  all  nodes  and  start  execution.  After  all  nodes  have 
finished  the  cube  is  released. 


USING  GRAPHICS 


Two  forms  of  graphics  are  possible: 

(1)  GHOST  interface  -  Off-line  graphics  is  provided  if  the  Culham  laboratory 

-  Ghost  graphics  library  is  available.  Each  node  can  be 

made  to  write  a  separate  file  of  graphics  instructions 
that  can  be  subsequently  processed  and  viewed,  using  standard  Ghost 
interactive  facilities. 


(1.1)  cp  .F0R_0_LIS. ghost  .F0R_0_LIS . inc 

(1.2)  cp  lib_lis .ghost  lib_lis.inc 

(1.3)  cp  makefile . iPSC  makefile 
or  (1.3a)  cp  makefile. sim  makefile 

(1.4)  make 

(1.5)  operate  as  benchmark  program  above 
The  graphics  output  files  generated  are: 


removes  ghost  dummies 
in  cpslz.f 

include  ghost  library 
to  use  real  iPSC 
to  use  simulator 
make  new  'node' 


g_case<n>p<m> :<r> 

ghost  graphics  grid  file  for  process  <r>  from  <m>-process  run  of  case<n> 

(2)  XGENIE  interface  -  On-line  graphics  is  provided  via  the  xgenie  daemon 

-  which  is  attached  to  the  running  host  program  via 

a  UNIX  pipe.  All  nodes  write  coded  graphics 
instructions  to  the  host  program  using  standard  Intel  csend/crecv 
instructions.  After  loading  the  node  program  onto  the  iPSC/860  nodes, 
the  host  program  goes  into  an  endless  loop  reading  coded  graphics 
instructions  from  the  nodes,  and  sending  them  over  the  pipe  to  xgenie, 
which  plots  them  in  an  X-window,  previously  opened  by  the  host.  A 
control-C  ends  the  host  program  when  required.  This  also  kills  the 
nodes  (usually!).  The  xgenie  on-line  graphics  has  been  setup  to  run 
with  the  SUN  simulator  in  directory: 

picsim.d 

This  may  be  used  to  demonstrate  the  paralellization  of  the  program,  but 
cannot  be  used  to  obtain  timing  or  performance  numbers. 


(2.1) 

(2.2) 

(2.3) 

(2.4) 
or  (2 . 

(2.5) 

(2.6) 


cp  . F0R_0_LIS . noghost  . F0R_0_LIS . inc 
cp  lib_lis . noghost  lib_lis.inc 
cp  host.f.graf  host . f 
cp  makef ile . iPSC  makefile 
4a)  cp  makefile. sim  makefile 
make 


inserts  ghost  dummies 
no  libraries  required 
host  program  +  grafloop 
to  use  real  iPSC 
to  use  simulator 
maxe  node  program 


vi  xgenie. h  line  81  put  correct  directory  path  into 
definition  of  Def aultServerPath  to  reach 
picsim. d/xgenie .  Use  absolute  path  from  root. 
(2.7)  make  xgenielF  compile  xgenie  FORTRAN 

interface  routines 

make  xgenie  make  xgenie  daemon 

make  host  make  host  program 


(2.8) 

(2.9) 


The  program  must  be  run  from  the  host  as  follows,  e.g.: 

(2.10)  cp  casel.dat  input.dat  setup  test  input 

(2.11)  v.  input.dat  edit  line-2  last  character  to  4 

(2.11)  getcube  -t4  get  4  nodes 

(2.12)  host  run  host  executable  on  host 

X-window  appears  and  device  is  drawn 
100  step  benchmark  run  is  done 
Final  particle  distribution  displayed 
(2.13>  kill  job  with  control-C 


If  particle  trajectories  are  required  as  computation  takes  place, 
uncomment  line  155  (CALL  SCAPLT)  in  c3sl.f  (OUTPUT),  and  type  'make'. 


i 


